Guest post: Using GCP for massive drug discovery virtual screening

By Woody Sherman, CSO and Vipin Sachdeva, Principal Investigator, Silicon Therapeutics

[Editor’s note: Today we hear from Boston, MA-based Silicon Therapeutics, which is applying computational methods in the context of complex biochemical problems relevant in human biology.]

As an integrated computational drug discovery firm, we recently deployed our INSITE Screening platform on Google Cloud Platform (GCP) to analyze over 10 million commercially available molecular compounds as potential starting materials for next-generation medicines. In one week, we performed over 500 million docking computations to evaluate how a protein responds to a given molecule. Each computation involved a docking program that predicted the preferred orientation of a small molecule to a protein and the associated energetics so we could assess whether or not it will bind and alter the function of the target protein.

With a combination of Google Compute Engine standard and Preemptible VMs, we used up to 16,000 cores, for a total of 3 million core-hours and a cost of about $30,000. While this might sound like a lot of time and money, it’s a lot less expensive and a lot faster than experimentally screening all compounds. Using a physics-based approach such as our INSITE platform is much more computationally expensive than some other computational screening approaches, but it allows us to find novel binders without the use of any prior information about active compounds (this particular target has no drug-like compounds known to bind). In a final stage of the calculations we performed all-atom molecular dynamics (MD) simulations on the top 1,000 molecules to determine which ones to purchase and experimentally assay for activity.

The bottom line: We successfully completed the screen using our INSITE platform on GCP and found several molecules that have recently been experimentally verified to have on-target and cell-based activity.

We chose to run this high-performance computing (HPC) job on GCP over other public cloud providers for a number of reasons:

Availability of high-performance compute infrastructure. Compute Engine has a good inventory of high-performance processors that can be configured with large amounts of cores and memory. It also offers GPUs — a great fit for some of our computations, such as molecular dynamics and free energy calculations. SSD made a big difference in performance, as our total I/O for this screen exceeded 40 TB of raw data. Fast connectivity between the front-end and the compute nodes was also a big factor, as the front-end disk was NFS-mounted on the compute nodes.
Support for industry standard tools. As a startup, we value the ability to run our workloads wherever we see fit. Our priorities can change rapidly based on project challenges (chemistry and biology), competition, opportunities and the availability of compute resources. Our INSITE platform is built on a combination of open-source and proprietary in-house software, so portability and repeatability across in-house and public clouds is essential.
An attractive pricing model. Preemptible VMs are great combination of cost-effective and predictable, offering up to 80% off standard instances — no bidding and no surprises. That means we don’t have to worry about jobs being killed due to a bidding war, which can create significant delays in completing our screens and requires unnecessary human overhead to manage the jobs.

We initialized multiple clusters for the screening; specifically, our cluster’s front-end consisted of three full-priced n1-highmem-32 VM instances with 208GB of RAM that ran the queuing system, and that connected to a 2TB SSD NFS filestore that housed the compound library. Each of these front-end nodes then spawned up to 128 compute nodes configured as n1-highcpu-32 Preemptible VMs, each with 28.8GB of memory. Those compute nodes performed the actual molecular compound screens, and wrote their results back to the filestore. Preemptible VMs run for a maximum of 24 hours; when that time elapsed, the front-end nodes drained any jobs remaining on the compute nodes and re-spawned a new set of nodes until all 10 million compounds had been successfully run.

To manage compute jobs, we enlisted the help of two popular open-source tools: Slurm, a workload manager used by 60% of the world’s TOP500 clusters, and ElastiCluster, which provides a command-line tool to create, manage and setup compute clusters hosted on a variety of cloud infrastructures. Using these open-source packages is economical, provides the lion’s share of the functionality of paid software solutions and ensures we can run our workloads in-house or elsewhere.

More compute = better results
But ultimately, the biggest benefit of using GCP was being able to more thoroughly screen compounds than we could have done with in-house resources. The target protein in this particular study was highly flexible, and having access to massive amounts of compute power allowed us to more accurately model the underlying physics of the system by accounting for protein flexibility. This yielded more active compounds than we would have found without the GCP resources.

The reality is that all proteins are flexible, and undergo some form of induced fit upon ligand binding, so treating protein flexibility is always important in virtual screening if you want the best results. Most molecular docking programs only account for ligand flexibility, so if the receptor structure is not quite right then active compounds might not fit and therefore be missed, no matter how good the docking program is. Our INSITE screening platform incorporates protein flexibility in a novel way that can greatly improve the hit rate in virtual screening, even as it requires a lot of computational resources when screening millions of commercially available compounds.

Example of the dynamic nature of protein target (Interleukin018, IL18)

From the initial 10 million compounds, we prioritized 250 promising compounds for experimental validation in our lab. As a small company, we don’t have the capabilities to experimentally screen millions of compounds, and there’s no need to do so with an accurate virtual screening approach like we have in our INSITE platform. We’re excited to report that at least five of these compounds have shown activity in human cells, suggesting them as promising starting points for new medicines. To our knowledge, there are no drug-like small molecule activators of this important and challenging immune-oncology target.

To learn more about the science at Silicon Therapeutics, please visit our website. And if you’re an engineer with expertise in high performance computing, GPUs and/or molecular simulations, be sure to visit our job listings.
Quelle: Google Cloud Platform

Resumable Online Index Rebuild is in public preview for Azure SQL DB

We are delighted to announce that Resumable Online Index Rebuild (ROIR) is now available for public preview in Azure SQL DB. With this feature, you can resume a paused index rebuild operation from where the rebuild operation was paused rather than having to restart the operation at the beginning. Additionally, this feature rebuilds indexes using only a small amount of log space. You can use the new feature in the following scenarios:

Resume an index rebuild operation after an index rebuild failure (such as after a database failover or after running out of disk space). There is no need to restart the operation from the beginning. This can save a significant amount of time when rebuilding indexes for large tables.
Pause an ongoing index rebuild operation and resume it later. For example, you may need to temporarily free up system resources in order to execute a high priority task or you may have a single maintenance window that is too short to complete the operation for a large index. Instead of aborting the index rebuild process, you can pause the index rebuild operation and resume it later without losing prior progress.
Rebuild large indexes without using a lot of log space and have a long-running transaction that blocks other maintenance activities. This helps log truncation and avoid out of log errors that are possible for long running index rebuild operations.

For more information about ROIR please review the following documents

Guidelines for Online Index Operations
ALTER INDEX (Transact-SQL)
sys.index_resumable_operations

For public preview communication on this topic please contact the ResumableIDXPreview@microsoft.com alias.
Quelle: Azure

Database Scoped Global Temporary Tables in public preview for Azure SQL DB

We are delighted to announce that Database Scoped Global Temporary Tables are in public preview for Azure SQL DB. Similar to global temporary tables for SQL Server, tables prefixed with ##table_name, global temporary tables for Azure SQL DB are stored in tempdb and follow the same semantics. However, rather than being shared across all databases on the server, they are scoped to a specific database and are shared among all users’ sessions within that same database. User sessions from other Azure SQL databases cannot access global temporary tables created as part of running sessions connected to a given database.  Any user can create global temporary objects.

Example

Session A creates a global temp table ##test in Azure SQL Database testdb1 and adds 1 row

     T-SQL command

CREATE TABLE ##test ( a int, b int);
INSERT INTO ##test values (1,1);

Session B connects to Azure SQL Database testdb1 and can access table ##test created by session A

     T-SQL command

SELECT * FROM ##test
—Results
1,1

For more information on Database Scoped Global Temporary Tables for Azure SQL DB see  CREATE TABLE (Transact-SQL).
Quelle: Azure

Amazon Could Get Into Meal Kits, And It's Crushing Blue Apron

Scott Eisen / Getty Images

Investors think the meal-kit delivery service Blue Apron has a very dangerous enemy: Amazon.

Shares of the newly public meal-kit company dived again on Monday morning following news that Amazon had applied to trademark the phrase “We do the prep. You be the chef,” suggesting that the e-commerce giant is planning to launch its own prepared meal-kit service.

Blue Apron's stock fell about 11% to $6.55 as of noon on Monday, down 35% from its initial public offering price of $10. The IPO had originally been projected to be priced between $15 and $17, but fell as concerns mounted about Blue Apron's high marketing spending, and due to pressure from Amazon's plan to buy Whole Foods.

The trademark application is for “prepared food kits composed of meat, poultry, fish, seafood, fruit and/or and vegetables and also including sauces or seasonings, ready for cooking and assembly as a meal” as well as frozen meals. While the filing doesn't mention delivery specifically, it is listed under, among other things, “retail store services and online retail store services in the field of fresh and prepared foods and dry goods.”

Yahoo Finance/BuzzFeed

The British newspaper The Times first reported the trademark application on Sunday.

Amazon and Blue Apron did not immediately respond to requests for comment.

Amazon is planning to acquire Whole Foods for almost $14 billion, a move that put the fear of Bezos into much of the retail and grocery sector. The merger was announced while Blue Apron was preparing to go public, and may have contributed to the meal-kit company's bankers lowering their estimate for its share value until the day they began to trade.

While Amazon has said little about how it would operate Whole Foods, many analysts have speculated that some kind of meal-kit delivery service — using Amazon's existing logistics expertise and built-in network of Amazon Prime customers — would be a natural integration between the two.

Not all patents and trademarks registered by technology companies turn into actual services, but Blue Apron has been a skittish stock since it went public late last month — it fell over 10% last week after a brokerage firm put out a research report pegging its value at only $2 a day. Amazon shares were up by less than 1% by mid-Monday.

A Single Share Of Blue Apron Now Costs Less Than A Single Blue Apron Meal

Blue Apron Shares Rose 0.00% On First Day Of Trading

Blue Apron Goes Public On Thursday, But It’s Not Looking Pretty

Quelle: <a href="Amazon Could Get Into Meal Kits, And It's Crushing Blue Apron“>BuzzFeed

Azure Data Lake Tools for Visual Studio Code (VSCode) July updates

We are pleased to announce the July updates of Azure Data Lake Tools for VSCode. This is a quality milestone and we added local debug capability for C# code behind for window users, refined Azure Data Lake (ADLA & ADLS) integration experiences, and focused on refactoring the components and fixing bugs. Azure Data Lake Tools for VSCode is an extension for developing U-SQL projects against Microsoft Azure Data Lake! This extension provides you a cross-platform, light-weight, and keyboard-focused authoring experience for U-SQL while maintaining a rich set of development functions. Summary of key updates Local Run for Windows Users This update allows you to perform local run to test your local data. Execute your script locally before publishing your production ready code to ADLA. Use command ADL: Start Local Run Service to start local run service. The cmd console shows up. For first time users, enter 3 and set up your data root. Use command ADL: Submit Job to submit your job to your local account. After job submission, you can view the submission details by clicking jobUrl in the output window, or view the job submission status from the CMD console. Local Debug for Window Users Local Debug enables you to debug your C# code behind, step through the code, and validate your script locally before submitting to ADLA. Use command ADL: Start Local Run Service to start local run service and set a breakpoint in your code behind, then click command ADL: Local Debug to start local debug service. You can debug through the debug console and view parameter, variable, and call stack information. Register assemblies through configuration Register assemblies through configuration provides you more flexibility to register your dependency and upload your resources. Use command ADL: Register Assembly through Configuration to register your assembly, register the assembly dependencies, and upload resources through a simple configuration. Upload file through configuration Upload file through configuration boosts your productivity and offers you the capability to upload multiple files at the same time. Use command ADL: Upload File through Configuration to upload multiple files through a simple configuration. How do I get started? First, install Visual Studio Code and download the prerequisite files including JRE 1.8.x, Mono 4.2.x (Linux and Mac), and .Net Core (Linux and Mac). Then get the latest ADL Tools by going to the VSCode Extension repository or VSCode Marketplace and searching “Azure Data Lake Tools for VSCode”. For more information about Azure Data Lake Tool for VSCode, please see: Get more information on using Data Lake Tools for VSCode. Watch the ADL Tools for VSCode User instructions video. Learn more about how to get started on Data Lake Analytics. Learn how to Develop U-SQL assemblies for Azure Data Lake Analytics jobs.   If you encounter any issues, please submit it to:  https://github.com/Microsoft/AzureDatalakeToolsForVSCode/issues Want to make this extension even more awesome? Share your feedback
Quelle: Azure

Azure Stream Analytics now available in UK West, Canada Central and East

As a part of our ongoing commitment to enable higher performance, and support customer requirements around data location, we’re pleased to announce that Azure Stream Analytics is now available in 3 additional regions: UK West, Canada Central, and Canada East.

With this announcement, Stream Analytics is now available in 26 Azure regions worldwide. For more information about local pricing, please visit Azure Stream Analytics pricing webpage.

Azure Stream Analytics is a serverless, scale-out job service built to help customers easily develop and run massively parallel real-time analytics across multiple streams of data using simple SQL like language. For example, a recent case study demonstrates how SkyAlert leveraged Azure Stream Analytics in conjunction with other Azure services to build an early-warning system that alerts citizens about an impending earthquake up to 2 minutes before it is felt, that could potentially save many precious lives in case of a natural disaster.

New to Azure Stream Analytics? Learn to build your first Stream Analytics application by following this step-by-step guide.
Quelle: Azure

Now You Can Use AWS Directory Service for Microsoft Active Directory to Help Maintain HIPAA and PCI Compliance in the AWS Cloud

Now you can use AWS Directory Service for Microsoft Active Directory (Enterprise Edition), also known as AWS Microsoft AD, to build and run Active Directory (AD)–aware applications in the AWS Cloud that are subject to U.S. Health Insurance Portability and Accountability Act (HIPAA) or Payment Card Industry Data Security Standard (PCI DSS) compliance. AWS Microsoft AD reduces the effort required of you to deploy compliant AD infrastructure for your cloud-based applications, as you manage your own HIPAA risk management programs or PCI DSS compliance certification. 
Quelle: aws.amazon.com