How to implement document tagging with AutoML

Many businesses need to digitize photos, documents, memos, and other types of physical media to help with tasks like invoice processing, application review, and contract analysis. At Google Cloud, we provide a number of ways customers can do this, from using our pre-trained machine learning APIs, to build on our AutoML suite, to applying Document Understanding AI, our latest AI solution.In this post, we’ll focus on one approach, using Cloud AutoML to perform document tagging for the purposes of document processing. Document tagging means identifying key value pairs from a document like responses (or values) to fields (or tags) such as customers, account numbers, totals, and more. Here, ‘tags’ are the fields that one wants to extract, and ‘values’ are the knowledge against that tag. In this solution, we’ll use AutoML to fetch important content from an image like signatures, stamps, and boxes, for processing.Solutions of the pastA few years ago, digitizing a document meant simply scanning and storing it as an image in the cloud. Now, with better tools and techniques, and with the recent boom in ML-based solutions, it is possible to convert a physical document into structured data that can be automatically processed, and from which useful knowledge can be extracted.Until recently, digitizing documents required the application of a rule-based methodology like using regular expressions for identifying fields, or extracting OCR from fixed field positions. But these solutions don’t always work on new documents and can be problematic with keyword-matching or text-based NLP models. Object detection and entity recognition, which gained a lot of traction in the last few years, have now led to significant improvements in this area. Cloud AutoML, our suite of AI services that let you create high-quality custom machine learning models with minimal ML expertise, is one example of that.A GCP solution: AutoML at scaleThere are a wide variety of AutoML services that can be used as a foundation to create models that solve unique business problems. In the case of document digitization, one possible architecture that can be used looks like this:This type of architecture is not just simple to follow, but also easy to deploy in production. All components are based on existing GCP products that are highly scalable, serverless, and can be directly put in production.Tagged document—You can use the AI Platform Data Labeling Service if you don’t already have annotated data.OCR & object detection—This can be done by Vision API and AutoML Vision Object Detection, a recent addition to the AutoML suite of products.Merge and feature processing—There are several different ways this can be done, like using a simple Jupyter notebook or a Python-based containerized solution.Entity recognition—This can be done by using Entity extraction, a new feature in AutoML Natural Language,  a recent addition to the AutoML suite of productsPost processing—This can be done in a similar fashion to feature processing.The whole pipeline can be orchestrated using Cloud Composer, or can be deployed using Google Kubernetes Engine (GKE). However, some business problems, for e.g. building customized data ingestion pipeline to GCP, rules extraction from legal documents, redact sensitive information from the documents before parsing etc., require additional customizations that can be developed in addition to the above mentioned architecture. For such requirements you can contact our sales team for more details and help.Value generationDifferent ML solutions have their own business or technical benefits—and many of our customers have used solutions like this one to meet their objectives, whether it’s enhancing the user experience, decreasing operational costs, or reducing overall errors. Solutions like the one described in this post can be used across industries such as healthcare, financial services, media, and more. Here are just a few examples:Automatically extracting knowledge from Electronic Health Records (EHR).Key value pair generation from invoices.Field fetching from financial documents.Text understanding of customer complaints.Tagging of bank checks, tickets, and other data.What’s nextIn this age of deep learning, solutions that simplify the training process, like transfer learning, are increasingly needed. The architecture described in this post has been successfully tested and deployed to work at scale, and makes it possible to digitize documents without needing thousands of annotated images for model training. Data variability, however, is still an important factor in any machine learning-based solution. AutoML automatically solves a lot of basic problems for variance in data, making it possible for you to use as little as a few thousand images to train a custom model.Helping customers process their documents fits perfectly with Google’s mission to organize the world’s information and make it universally accessible and useful. We hope that by sharing this post, we can inspire more organizations to look to the cloud. Tools like Cloud AutoML Vision, Cloud AutoML Natural Language, and Cloud Storage can help you build a rich data set and improve the end-user experience.This is a simple and targeted solution for a specific problem. For broader and more powerful document process automation and insight extraction technology, please refer to Google’s Document Understanding AI solution. AutoML is a core component of the end-to-end Document Understand AI solution, which is easy to deploy through our partners, and requires no machine learning expertise. You can learn more on our website.
Quelle: Google Cloud Platform

Social Media: Twitter hat neue Regeln für Politiker

Bislang hat Twitter auch die umstrittenen Beiträge von US-Präsident Trump nicht gelöscht. Mit neuen Nutzungsbestimmungen setzt sich das Unternehmen nun selbst unter Zugzwang, problematische Beiträge als Regelverstoß zu markieren – wenn es sich traut. (Twitter, Soziales Netz)
Quelle: Golem

Introducing Equiano, a subsea cable from Portugal to South Africa

Today we are introducing Equiano, our new private subsea cable that will connect Africa with Europe. Once complete, Equiano will start in western Europe and run along the West Coast of Africa, between Portugal and South Africa, with branching units along the way that can be used to extend connectivity to additional African countries. The first branch is expected to land in Nigeria. This new cable is fully funded by Google, making it our third private international cable after Dunant and Curie, and our 14th subsea cable investment globally.Equiano’s planned route and branching units, from which additional potential landings can be built.Google’s private subsea cables all carry the names of historical luminaries, and Equiano is no different. Named for Olaudah Equiano, a Nigerian-born writer and abolitionist who was enslaved as a boy, the Equiano cable is state-of-the-art infrastructure based on space-division multiplexing (SDM) technology, with approximately 20 times more network capacity than the last cable built to serve this region. Equiano will be the first subsea cable to incorporate optical switching at the fiber-pair level, rather than the traditional approach of wavelength-level switching. This greatly simplifies the allocation of cable capacity, giving us the flexibility to add and reallocate it in different locations as needed. And because Equiano is fully funded by Google, we’re able to expedite our construction timeline and optimize the number of negotiating parties. A contract to build the cable with Alcatel Submarine Networks was signed in Q4 2018, and the first phase of the project, connecting South Africa with Portugal, is expected to be completed in 2021.Over the last three years, Google has invested US$47 billion to improve our global infrastructure, and Equiano will further enhance the world’s highest capacity and best connected international network. We’re excited to bring Equiano online, and look forward to working with licensed partners to bring Equiano’s capacity to even more countries across the African continent.
Quelle: Google Cloud Platform