Google Cloud Platform News, Entwicklungen, Updates, HowTos - Seite 198 von 303

Data always plays a critical role in the ability to research, study, and combat public health emergencies, and nowhere is this more true than in the case of a global crisis. Access to data sets—and tools that can analyze that data at cloud scale—are increasingly essential to the research process, and are particularly necessary in the global response to the novel coronavirus (COVID-19).To aid researchers, data scientists, and analysts in the effort to combat COVID-19, we are making a hosted repository of public datasets, like Johns Hopkins Center for Systems Science and Engineering (JHU CSSE), the Global Health Data from the World Bank, and OpenStreetMap data, free to access and query through our COVID-19 Public Dataset Program. Researchers can also use BigQuery MLto train advanced machine learning models with this data right inside BigQuery at no additional cost. “Making COVID-19 data open and available in BigQuery will be a boon to researchers and analysis in the field,” says Sam Skillman, head of engineering at Descartes Labs. “In particular, having queries be free will allow greater participation, and the ability to quickly share results and analysis with colleagues and the public will accelerate our shared understanding of how the virus is spreading.”These datasets remove barriers and provide access to critical information quickly and easily, eliminating the need to search for and onboard large data files. Researchers can access the datasets from within the Google Cloud Console, along with a description of the data and sample queries to advance research. All data we include in the program will be public and freely available. The program will remain in effect until September 15, 2020. “Developing data-driven models for the spread of this infectious disease is critical,” said Matteo Chinazzi, Associate Research Scientist, Northeastern University. “Our team is working intensively to model and better understand the spread of the COVID-19 outbreak. By making COVID-19 data open and available in BigQuery, researchers and public health officials can better understand, study, and analyze the impact of this disease.”The contents of these datasets are provided to the public strictly for educational and research purposes only. We are not onboarding or managing PHI or PII data as part of the COVID-19 Public Dataset Program. Google has practices and policies in place to ensure that data is handled in accordance with widely recognized patient privacy and data security policies.We on the Google Cloud team sincerely hope that the COVID-19 Public Dataset Program will enable better and faster research to combat the spread of this disease. Get started today.
Quelle: Google Cloud Platform

28. März 2020

da Agency

Loading geospatial data into BigQuery just got easier with FME

With all of the geographical data available today, we made sure that Google Cloud’s BigQuery data warehouse includes first-class support for geospatial data types and functions. With this unique capability, you can process and analyze geospatial data at scale. To accelerate the workflows of our geospatial customers, we’re announcing our partnership with Safe Software, the maker of FME.FME is a data integration platform designed to support spatial data worldwide, and 2020.0 brings the ability to ingest data from more than 450 geo formats and applications and materialize them as BigQuery tables.FME is ideal to use when you need to ingest and transform one of the myriad geospatial file and data formats and land that data in BigQuery. FME is designed to help you overcome common data integration challenges. Using a visual interface, you can build workflows to extract, transform, load, integrate, validate, and share data. Plus, you can build event-based workflows to automate your data integration tasks, create notification services, and take advantage of real-time processing.Using FME to connect to BigQuery GISThere are hundreds of GIS file types and projections. Loading them into a data warehouse requires transforming the data type and its projection into the native projection of the data warehouse. In this case, BigQuery GIS uses the WGS84 coordinate system. Workflows are scalable: When you build a data integration workflow in FME, you can ingest a single file or hundreds at a time, transform them, and load them directly into BigQuery tables, all within FME. Here’s a look at the FME Workbench interface, the authoring environment for data integration workflows.FME supports hundreds of formats, applications, and systems, and includes 493 transformers to help data manipulation tasks like geometry and geography creation, validation, generalization, and coordinate system reprojections. With data coming from different sources, data validation and quality control is a critical step in your workflow. The transformers are accessible through the GUI and let you create consistent and repeatable spatial data pipelines. That way you can make sure the data migrated to data warehouses like Google BigQuery is valid and meets all requirements. Using geospatial data in productionWe’ve heard from customers using FME and BigQuery that they can move and transform data more quickly to focus on innovation.”We’ve been using FME to transform shp, GeoJson, CAD and various other data types for several years,” says Adam Radel, IT director for GISP, State of Utah Department of Transportation. “We’ve imported over 1,000 files to BigQuery using FME in just a few weeks. The addition of the BQ writer to FME is game-changing for us. I’m excited that my team has such a powerful tool available to them.” How to load geospatial data from FME to BigQueryCheck out this detailed look at how the loading process works. The example starts with the assumption that the user has data to load, a shape file for an example, and a licensed or trial version of FME Desktop installed on a virtual machine or on their local machine. (Note: FME has example files for loading.) If you don’t have FME running already, get started with a trial straight from the Google Cloud Marketplace, and check out these instructions on how to deploy FME in Google Cloud.To use the data that you load with FME, you can also check out examples of how to query data in BigQuery GIS, like plotting hurricane paths in BQ GIS or k-means clustering of spatial with BQML. We modeled the BQ GIS syntax on PostGIS, so you’ll find the queries easy to compose as well. And click on Explore in GeoViz from the user interface to get a quick styleable visualization of your results. For a more scalable, cloud-native GIS visualization solution that can use BigQuery as a scalable spatial backend, take a look at our partner CARTO.
Quelle: Google Cloud Platform

24. März 2020

da Agency

When girls are the SHERO of the story

Editor’s note: We’re celebrating Women’s History Month by talking with Cloud Googlers about identity and how it influences their work in technology. Cloud Googler Komal Singh’s path has taken her from India to Waterloo, Canada, where she’s an engineering program manager working on serverless products. Her 20% project at Google resulted in the publication of her first children’s STEM book, Ara the Star Engineer, which follows a young girl who uses coding to tackle big dreams and meets real-life women trailblazers. Her recent TED Talk, “Recoding Stories at Scale,” talks about exploring technology and AI in creative ways to represent minorities and girls in books in ways that inspire them.Here, she shares her path to working in technologyWho inspired you to go into engineering?I grew up in India in the 1980s, and always loved sci-fi, physics, and math. I didn’t know female engineers, but I knew women who were doctors, and we had a female prime minister in India—so I assumed women could be prime ministers, but not engineers. My dad had a huge influence on me. He always encouraged me to be more hands-on, and showed me how to do things like change a lightbulb or fix the car engine. During dinner conversations, he created problems for me to think about, like how many rotations the fan was doing per minute.In high school, I was amongst the few girls taking computer science courses. We usually worked together, and when we got a program to run, teachers thought it was a fluke, or that we were copying others’ work. There was extra pressure to prove that we had gotten it right ethically. When you’re part of a small percentage like that, it’s harder to be heard, and it’s easy to start doubting your abilities.I also loved watching Dana Scully on the X Files TV show. There’s actually a “Scully Effect” phenomenon that’s been researched, which found that more than 70% of women who watched that show went on to STEM fields. I wish I had also had someone to look up to who wasn’t white, with blond hair. I think I would be a more fearless leader now. I’m grateful now that I have role models here at work, senior women who I look up to. I want my daughter to see herself represented in ways that I didn’t. When my daughter was four, she told me that engineers are boys. As a woman of color and first-generation immigrant, I wanted to do something for her so she would know that wasn’t true. So I started a 20% project [a Google option for employees to explore topics of interest] to write a children’s book.Why use books as a way to change perceptions?The pipeline for getting girls into engineering and other STEM fields starts when they are about six. There are many initiatives being started, like Girls Who Code, Canada Learning Code, and Black Girls Code, but we need more funding for efforts like this. It can be hard to scale these programs, but books can operate at scale. Books are so pervasive, and can really influence kids as an everyday object. For kids, seeing people who look like them in books is really important.Less than 5% of kids’ books feature people of color in lead roles. I wanted to put technology to good use, so I started a 20% project to create a series of books that feature more girls and women of color. In parallel, this project is working on making storytelling more inclusive, and we’re using AI to experiment with making traditional characters more racially diverse, so a reader could see Goldilocks as a black or Asian girl, or as a non-binary character, for example.The book has been published in 10 other countries, and my daughter has traveled with me to some of these book launches. When a journalist in China asked her what she wanted to be when she grew up, she replied “an author and an engineer.” I love the fan mail that I get about the book. Girls around the world want to be problem solvers. I also hope my TED Talk on recoding stories will inspire more people to take action to make kids’ literature more equitable.What advice do you give to those newer to the workforce?Persistence pays off! I tried three times to work at Google over five years across different locations and job roles. The third time worked for me. Stay the course. Don’t be tempted to give up. And remember to be a wholesome person, whatever that means for you. For me, it’s being a good mom, having a meaningful career, and not giving up on my own hobbies and time for myself. It can be tough, but remember that your career isn’t a linear path. It will take turns along the way. This 20% project, for me, has opened up truly valuable opportunities that I didn’t foresee.
Quelle: Google Cloud Platform

23. März 2020

da Agency

Simplified global game management: Introducing Game Servers

To deliver the multiplayer gaming experiences gamers expect, game developers are increasingly relying on dedicated game servers as the default option for connecting players. But hosting and scaling a game server fleet to support a global game can be challenging, and many game companies either end up building costly proprietary solutions, or turning to pre-packaged solutions that limit developer choice and control.Agones, an open source game server hosting and scaling project built on Kubernetes, was cofounded by Google Cloud and Ubisoft to offer a simpler option. It provides a community-developed alternative to proprietary solutions that also gives developers the freedom to seamlessly host and scale game server clusters across multiple environments—in multiple clouds, on premises, or on local machines.Alejandro Gonzalez, GM Jam City Bogota shared his experience using Agones for the real-time strategy mobile game World War Doh: “Agones was a key piece in our relay strategy as it allowed us to easily administrate the Kubernetes-based relays for World War Doh. Agones saved us precious time required for a custom inhouse counterpart and in addition, kept our implementation generic and available to run on top of multiple cloud providers.”Today, we’re announcing the availability of Game Servers beta, a managed service offering of Agones. Whereas Agones is ideal for managing regional game server clusters, Game Servers supercharges Agones to simplify managing global multi-cluster game server fleets. If you’re already running Agones in production workloads, you can opt into the managed service by simply registering Agones-managed game server clusters with the new Game Servers API. And you can opt out of the managed service at any time if you want to go back to manual management.You can also group these clusters into a concept we call realms—logical groupings of Kubernetes clusters, designed around a game’s latency requirements. You can then define game server configurations and scaling policies to simplify fleet management across realms and the clusters within them, all while still maintaining control and visibility.Game Servers can help you plan for a variety of scenarios. For example, you can choose to increase the reserved capacity of game servers for a planned game event, or for a specific date and time range. Additionally, you can automate scaling to account for daily peak and non-peak hours across different regions. Game Servers’ rollout flexibility also means that you can A/B test different game server configurations and canary test changes, rolling them back if necessary. In beta, Game Servers will initially support clusters running on Google Kubernetes Engine (GKE) only and we are diligently working on hybrid and multi-cloud support for later this year. The second half of 2020 will also bring more advanced scaling policies, and a deeper integration with our open source matchmaking framework, Open Match. Learn more about how to get started with Game Servers here.Game Servers is the latest solution in Google Cloud’s ongoing effort to help game developers remove complexity from infrastructure management. Companies like Activision Blizzard are benefiting from our highly reliable global network, advanced data analytics and artificial intelligence (AI) capabilities, and commitment to open source, to bring great gaming experiences to their players.Join our Google for Games digital broadcast on Monday, March 23rd to hear from Google experts and leading gaming companies such as Improbable, Grenge, Colopl and Unity, who are using our technology to take their games to the next level. Learn more.
Quelle: Google Cloud Platform

22. März 2020

da Agency

8 tips untuk menyelesaikan pekerjaan saat bekerja dari jarak jauh

Dengan banyaknya perusahaan yang mempertimbangkan cara terbaik untuk menjaga komunikasi tim saat berada di lokasi yang berbeda-beda, sejumlah pelanggan meminta kami untuk memberikan rekomendasi tentang cara agar tetap produktif dan fokus dalam bekerja. Berikut adalah beberapa praktik terbaik untuk membina kolaborasi ketika tim Anda harus bekerja dari jarak jauh. Siapkan tim Anda untuk bekerja dari jarak jauhPastikan tim Anda menyiapkan alat dan proses yang tepat sebelum Anda beralih dari bekerja di kantor ke bekerja dari rumah. Setelah hal tersebut dipersiapkan, berikut adalah beberapa langkah persiapan tambahan yang bisa Anda lakukan: 1. Buat alias tim untuk mempermudah komunikasi. Yaitu daftar email yang mencakup semua anggota tim memungkinkan Anda berbagi informasi dengan cepat, dan ruang chat yang dapat digunakan untuk diskusi secara lebih cepat. 2. Periksa izin berbagi pada dokumen penting sehingga tim Anda dapat mengedit dan memberi komentar sesuai kebutuhan. Anda bahkan dapat mempertimbangkan untuk membuat drive bersama tempat tim Anda dapat menyimpan, mencari, dan mengakses file dari perangkat apa pun. 3. Jadwalkan rapat sekarang sehingga Anda bisa tetap berhubungan nanti. Siapkan undangan kalender, buat agenda di awal, dan lampirkan dokumen yang relevan dengan undangan. Ada baiknya juga untuk memastikan bahwa semua orang terbiasa dengan konferensi video. Jaga komunikasi dan kelola tim Anda setiap hariSetelah tim Anda disiapkan untuk bekerja dari rumah, berikut adalah beberapa cara agar semua anggota tim bisa mengikuti perkembangan terbaru.4. Adakan rapat harian untuk menjaga komunikasi dengan rekan kerja Anda. Bekerja di rumah dapat membuat sebagian orang merasa terisolasi, dan konferensi video adalah cara yang bagus untuk menjalin komunikasi dengan anggota tim. Cobalah untuk tampil di kamera jika kondisinya memungkinkan, sajikan konten yang relevan, dan ajukan pertanyaan untuk memicu percakapan. Jika tidak semua orang bisa mengikuti rapat karena perbedaan zona waktu, rekam rapat—namun pastikan dulu bahwa semua pesertanya tidak keberatan untuk direkam!5. Bagikan tujuan dan progres secara teratur. Baik melalui chat grup atau dokumen bersama yang bisa diakses oleh semua orang, catatan tentang progres yang dicapai adalah cara yang bagus untuk menjaga tim tetap solid, mengikuti info terbaru, dan menindaklanjuti tugas yang diberikan. Anda juga dapat menyiapkan situs internal untuk mengonsolidasikan informasi dan referensi yang penting bagi tim Anda, atau untuk membagikan informasi kepada organisasi Anda secara lebih luas.6. Tetap praktikkan etiket kerja yang baik. Hanya karena tim Anda tidak berada di kantor bukan berarti mereka tidak sibuk. Periksa kalender sebelum menjadwalkan rapat, dan ketika Anda menghubungi melalui chat, mulailah dengan menanyakan apakah ini saat yang tepat untuk berbicara. Anda juga dapat secara proaktif memberi informasi kepada rekan kerja tentang kesiapan Anda sendiri dengan mengatur jam kerja di Kalender. Dengan begitu, jika seorang anggota tim mencoba menjadwalkan rapat dengan Anda di luar jam kerja Anda, mereka akan diperingatkan dengan notifikasi. Menyelesaikan pekerjaan Anda dengan Wi-Fi di rumahSaat berbagi ruangan—dan koneksi internet—di rumah, Anda perlu memperhatikan kebutuhan penghuni rumah lainnya. Berikut beberapa tips dari kami.7. Jangan habiskan waktu sepanjang hari dalam video call.Ada banyak alat yang bisa Anda pakai untuk berkomunikasi dengan tim, baik itu ruang chat, dokumen bersama, survei cepat, atau konferensi video singkat. Pilih alat yang paling sesuai—terutama jika Anda berbagi koneksi internet.8. Temukan konfigurasi yang tepat untuk Anda. Anda mungkin perlu mencoba beberapa konfigurasi sebelum menemukan cara untuk tetap fokus dan tidak mengganggu orang lain. Berikut adalah enam tips untuk video call yang lebih baik termasuk cara mengaktifkan teks otomatis sehingga Anda dapat membaca transkrip rapat secara real time. Ini hanyalah beberapa ide dari tim G Suite tentang cara agar Anda tetap bisa berfokus dan berkolaborasi. Untuk informasi selengkapnya, tonton video ini yang berisi tips untuk bekerja dari rumah, dan lihat info terbaru di artikel tentang tips bekerja dari jarak jauh di Pusat Pembelajaran kami.
Quelle: Google Cloud Platform

22. März 2020

da Agency

8 mẹo để hoàn thành công việc khi làm việc tại nhà

Vì nhiều doanh nghiệp đang cân nhắc cách tốt nhất để các nhóm luôn kết nối khi mọi người không thể ở cùng một nơi, nên một số khách hàng của chúng tôi đã xin lời khuyên về cách làm việc tập trung và năng suất. Dưới đây là một số cách hay nhất để thúc đẩy cộng tác khi các nhóm của bạn làm việc từ xa. Thiết lập cách làm việc từ xa cho nhómHãy thiết lập các công cụ và quy trình phù hợp cho nhóm của bạn trước khi chuyển từ phương thức làm việc tại văn phòng sang làm việc tại nhà. Sau khi thiết lập, bạn có thể thực hiện trước một vài bước bổ sung dưới đây: 1. Tạo một bí danh cho nhóm để dễ dàng liên lạc. Tạo một danh sách email của tất cả các thành viên trong nhóm để có thể chia sẻ thông tin nhanh chóng. Bạn cũng có thể sử dụng phòng trò chuyện để thảo luận nhanh hơn. 2. Kiểm tra quyền chia sẻ đối với các tài liệu quan trọng để cộng tác viên có thể chỉnh sửa và bình luận khi cần. Thậm chí, bạn có thể cân nhắc việc tạo một bộ nhớ dùng chung để nhóm của mình có thể lưu trữ, tìm kiếm và truy cập vào các tệp từ bất kỳ thiết bị nào. 3. Lên lịch họp ngay để bạn có thể giữ liên lạc về sau. Thiết lập lời mời theo lịch, tạo trước chương trình làm việc và đính kèm các tài liệu có liên quan vào thư mời. Ngoài ra, bạn nên đảm bảo mọi người đều biết cách dùng hội nghị truyền hình. Giúp nhóm luôn kết nối và làm việc có tổ chức mỗi ngàyNhư vậy là bạn đã thiết lập xong cho nhóm và mọi người đều đã có thể làm việc tại nhà. Điều quan trọng là bạn phải đảm bảo mọi người luôn có tinh thần cộng tác trong công việc. Như vậy là bạn đã thiết lập xong cho nhóm và mọi người đều đã có thể làm việc tại nhà. Bạn có thể tham khảo một số cách sau đây để đảm bảo mọi người luôn có tinh thần cộng tác trong công việc. 4. Tổ chức các cuộc họp hàng ngày để luôn kết nối với đồng nghiệp của bạn. Một số người có thể cảm thấy cô lập khi làm việc tại nhà. Vì vậy, hội nghị truyền hình là một cách tuyệt vời để gắn kết mọi người. Hãy cố gắng tham gia hội nghị truyền hình khi thích hợp, trình bày nội dung phù hợp và đặt câu hỏi để khơi mào trò chuyện. Nếu sự chênh lệch múi giờ khiến một số người không thể tham gia cuộc họp, thì hãy ghi lại cuộc họp—nhưng phải đảm bảo việc này có được sự đồng ý của những người tham gia!5. Thường xuyên chia sẻ mục tiêu và thông tin mới nhất. Cho dù là trò chuyện nhóm hay cùng làm việc trên một tài liệu chung (mọi người cùng cập nhật thông tin trên đó), việc ghi lại những gì đang thực hiện là một cách hay để mọi người cảm thấy gắn kết, nắm bắt được thông tin và theo dõi được những việc cần làm. Bạn cũng có thể thiết lập một trang web nội bộ để tổng hợp các thông tin và tài nguyên quan trọng tại một trung tâm dành cho nhóm hoặc để chia sẻ thông tin rộng rãi hơn với tổ chức của bạn.6. Tiếp tục thực hiện các quy tắc ứng xử đúng mực tại nơi làm việc. Làm việc từ xa không có nghĩa là mọi người trong nhóm rảnh rỗi. Kiểm tra lịch trước khi lên lịch họp. Khi bạn trò chuyện với ai đó, hãy bắt đầu bằng cách hỏi xem họ có thể nói chuyện vào lúc đó hay không. Bạn cũng có thể chủ động cho đồng nghiệp biết thời gian rảnh của mình bằng cách thiết lập giờ làm việc trên Lịch. Bằng cách này, nếu một thành viên trong nhóm cố gắng lên lịch họp với bạn ngoài giờ làm việc của bạn, thì họ sẽ nhận được cảnh báo.Dùng Wi-Fi tại nhà để hoàn thành công việcKhi dùng chung không gian và kết nối Internet tại nhà, bạn có thể cần chú ý đến nhu cầu của những thành viên khác trong gia đình mình. Bạn có thể tham khảo một số mẹo sau đây. 7. Đừng dùng video suốt cả ngày. Bạn có thể tùy ý sử dụng nhiều công cụ để giữ liên lạc với nhóm mình, đó là phòng trò chuyện, tài liệu được chia sẻ, khảo sát ngắn hoặc cuộc gọi nhanh giữa nhiều bên. Hãy chọn công cụ hiệu quả nhất, đặc biệt là khi bạn đang dùng chung kết nối Internet.8. Tìm cách thiết lập phù hợp với bạn. Bạn có thể cần thử một vài cấu hình khác nhau thì mới biết được cách để luôn tập trung và không làm người khác sao lãng. Dưới dây là 6 mẹo để gọi video hiệu quả hơn, trong đó có cách bật phụ đề trực tiếp để bạn có thể đọc bản chép lời của cuộc họp trong thời gian thực. Trên đây chỉ là một vài cách mà đội ngũ G Suite đề xuất nhằm duy trì sự tập trung và tinh thần cộng tác. Để biết thêm thông tin, hãy xem các video này để biết những mẹo làm việc tại nhà và xem các thông tin mới nhất trong bài viết của chúng tôi trên Trung tâm kiến thức để biết mẹo làm việc từ xa.
Quelle: Google Cloud Platform

19. März 2020

da Agency

Google Cloud named a leader in the Forrester Wave for Public Cloud Development and Infrastructure Platforms

Today, we’re announcing that Google Cloud has been named a leader in The Forrester Wave™ for Public Cloud Development and Infrastructure Platforms, Q1 2020. This report evaluated cloud providers’ infrastructure and application development capabilities—important considerations for enterprises turning to the cloud to support their business growth and drive innovation.In this report, Forrester noted Google Cloud’s investment in global expansion and innovative development services. Infrastructure and global reachGoogle Cloud’s footprint has expanded to 22 regions with additional regions coming soon in Delhi, Doha, Toronto and Melbourne, providing enterprises with low-latency, high-performance compute, networking, analytics and storage services, as well as in-country disaster recovery options in India, Canada and Australia. This growth has enabled us to introduce new capabilities that allow you to control where you put your data to support regulatory, security and compliance requirements. We’ve also committed to extending the size and reach of our sales and support teams so that customers can get personalized attention around the globe.In the report, Forrester also gave Google Cloud the highest possible scores in the reliability, storage services, and security certifications criteria. Innovative development servicesForrester recognized in the report that “Google is best for customers who prioritize leading-edge AI/ML services and microservices/containers development,“ as well as Google Cloud’s popular CI/CD tools. Anthos, in particular, is a modern application platform that enables organizations to build, deploy and operate applications anywhere securely and consistently, while modernizing traditional applications for an increasingly hybrid and multi-cloud world. Anthos can manage workloads running on both on-prem and cloud environments, while reducing costs and improving developer velocity. Download the Forrester reportWe’re proud that Forrester has recognized Google Cloud’s infrastructure and development capabilities. To learn more, please download The Forrester Wave™ for Public Cloud Development report here.The Forrester Wave™: Public Cloud Development and Infrastructure Platforms, Q1 2020. The Forrester Wave™ is copyrighted by Forrester Research, Inc. Forrester and Forrester Wave™ are trademarks of Forrester Research, Inc. The Forrester Wave™ is a graphical representation of Forrester’s call on a market. Forrester does not endorse any vendor, product, or service depicted in the Forrester Wave™. Information is based on best available resources. Opinions reflect judgment at the time and are subject to change.
Quelle: Google Cloud Platform

18. März 2020

da Agency

Modernizing Twitter's ad engagement analytics platform

As part of the daily business operations on its advertising platform, Twitter serves billions of ad engagement events, each of which potentially affects hundreds of downstream aggregate metrics. To enable its advertisers to measure user engagement and track ad campaign efficiency, Twitter offers a variety of analytics tools, APIs, and dashboards that can aggregate millions of metrics per second in near-real time.In this post, you’ll get details on how the Twitter Revenue Data Platform engineering team, led by Steve Niemitz, migrated their on-prem architecture to Google Cloud to boost the reliability and accuracy of Twitter’s ad analytics platform.Deciding to migrateOver the past decade, Twitter has developed powerful data transformation pipelines to handle the load of its ever-growing user base worldwide. The first deployments for those pipelines were initially all running in Twitter’s own data centers. The input data streamed from various sources into Hadoop Distributed File System (HDFS) as LZO-compressed Thrift files in an Elephant Bird container format. The data was then processed and aggregated in batches by Scalding data transformation pipelines. Then, aggregation results were output into Manhattan, Twitter’s homegrown distributed key-value store, for serving. Additionally, a streaming system using Twitter’s homegrown systems Eventbus (a messaging tool built on top of DistributedLog), Heron (a stream processing engine), and Nighthawk (a sharded Redis deployment) powered the real-time analytics that Twitter had to provide, filling the gap between the current time and the last batch run.While this system consistently sustained massive scale, its original design and implementation was starting to reach some limits. In particular, some parts of the system that had grown organically over the years were difficult to configure and extend with new features. Some intricate, long-running jobs were also unreliable, leading to sporadic failures. The legacy end-user serving system was very expensive to run and couldn’t support large queries.To accommodate for the projected growth in user engagement over the next few years and streamline the development of new features, the Twitter Revenue Data Platform engineering team decided to rethink the architecture and deploy a more flexible and scalable system in Google Cloud.Platform modernization: First iterationIn the middle of 2017, Steve and his team tackled the first redesign iteration of its advertising data platform modernization, leading to Twitter’s collaboration with Google Cloud.At first, the team left the data aggregation legacy Scalding pipelines unchanged and continued to run them in Twitter’s data centers. But the batch layer’s output was switched from Manhattan to two separate storage locations in Google Cloud:BigQuery—Google’s serverless and highly scalable data warehouse, to support ad-hoc and batch queries.Cloud Bigtable—Google’s low-latency, fully managed NoSQL database, to serve as a back end for online dashboards and consumer APIs.The output aggregations from the Scalding pipelines were first transcoded from Hadoop sequence files to Avro on-prem, staged in four-hour batches to Cloud Storage, and then loaded into BigQuery. A simple pipeline deployed on Dataflow, Google Cloud’s fully managed streaming and batch analytics service, then read the data from BigQuery and applied some light transformations. Finally, the Dataflow pipeline wrote the results into Bigtable.The team built a new query service to fetch aggregated values from Bigtable and process end-user queries. They deployed this query service in a Google Kubernetes Engine (GKE) cluster in the same region as the Bigtable instance to optimize for data access latency.Here’s a look at the architecture:This first iteration already brought many important benefits:It de-risked the overall migration effort, letting Twitter avoid migrating both the aggregation business logic and storage at the same time.The end-user serving system’s performance improved substantially. Thanks to Bigtable’s linear scalability and extremely low latency for data access, the serving system’s P99 latencies decreased from 2+ seconds to 300ms.Reliability increased significantly. The team now rarely, if ever, gets paged for the serving system anymore.Platform modernization: second iterationWith the new serving system in place, in 2019 the Twitter team began to redesign the rest of the data analytics pipeline using Google Cloud technologies. The redesign sought to solve several existing pain points:Because the batch and streaming layers ran on different systems, much of the logic was duplicated between systems. While the serving system had been moved into the cloud, the existing pain points of the Hadoop aggregation process still existed.The real-time layer was expensive to run and required significant operational attention.With these pain points in mind, the team began evaluating technologies that could help solve them. They considered several open-source stream processing frameworks initially: Apache Flink, Apache Kafka Streams, and Apache Beam. After evaluating all possible options, the team chose Apache Beam for a few key reasons:Beam’s built-in support for exactly-once operations at extremely large scale across multiple clusters.Deep integration with other Google Cloud products, such as Bigtable, BigQuery, and Pub/Sub, Google Cloud’s fully managed, real-time messaging service.Beam’s programming model, which unifies batch and streaming and lets a single job operate on either batch inputs (Cloud Storage), or streaming inputs (Pub/Sub).The ability to deploy Beam pipelines on Dataflow’s fully managed service.The combination of Dataflow’s fully managed approach and Beam’s comprehensive feature set let Twitter simplify the structure of its data transformation pipeline, as well as increase overall data processing capacity and reliability.Here’s what the architecture looks like after the second iteration:In this second iteration, the Twitter team re-implemented the batch layer as follows: Data is first staged from on-prem HDFS to Cloud Storage. A batch Dataflow job then regularly loads the data from Cloud Storage, processes the aggregations, and dual-writes the results to BigQuery for ad-hoc analysis and Bigtable for the serving system.The Twitter team also deployed an entirely new streaming layer in Google Cloud. For data ingestion, an on-prem service now pushes two different streams of Avro-formatted messages to Pub/Sub. Each message contains a bundle of multiple raw events and affects between 100 and 1,000 aggregations. This leads to more than 3 million aggregations per second performed by four Dataflow jobs (J0-3 in the diagram above). All Dataflow jobs share the same topology, although each job consumes messages from different streams or topics.One stream, which contains critical data, enters the system at a rate of 200,000 messages per second and is partitioned in two separate Pub/Sub topics. A Dataflow job (J3 in the diagram) consumes those two streams, performs 400,000 aggregations per second, and outputs the results to a table in Bigtable.The other stream, which contains less critical but higher volume data, enters the system at a rate of around 80,000 messages per second and is partitioned into six separate topics. Three Dataflow jobs (J0, J1, and J2) share the processing of this larger stream, with each of them handling two of the available six topics in parallel, then also outputting the results to a table in Bigtable. In total, those three jobs process over 2 million aggregations/second.Partitioning the high-volume stream into multiple topics offers a number of advantages:The partitioning is organized by applying a hash function on the aggregation key and then dividing the function’s result by the number of available partitions (in this case, six). This guarantees that any per-key grouping operation in downstream pipelines is scoped to a single partition, which is required for consistent aggregation results.When deploying updates to the Dataflow jobs, admins can drain and relaunch each job individually in sequence, allowing the remaining pipelines to continue uninterrupted and minimizing impact on the end users.The three jobs can each handle two topics without issue currently, and there is still room to scale horizontally up to six jobs if needed. The number of topics (six) is arbitrary, but is a good balance at the moment based on current needs and potential spikes in traffic.To assist with job configuration, Twitter initially considered using Dataflow’s template system, a powerful feature that enables the encapsulation of Dataflow pipelines into repeatable templates that can be configured at runtime. However, since Twitter needed to deploy jobs with topologies that might change over time, the team decided instead to implement a custom declarative system where developers can specify different parameters for their jobs in a pystachio DSL: tuning parameters, data sources to operate on, sink tables for aggregation outputs, and the jobs’ source code location. A new major version of Dataflow templates, called Flex Templates, will remove some of the previous limitations with the template architecture and allow any Dataflow job to be templatized.For job orchestration, the Twitter team built a custom command line tool that processes the configuration files to call the Dataflow API and submit jobs. The tool also allows developers to submit a job update by automatically performing a multi-step process, like this:Drain the old job:Call the Dataflow API to identify which data sources are used in the job (for example, a Pub/Sub topic reader).Initiate a drain request.Poll the Dataflow API for the watermark of the identified sources until the maximum watermark is hit, which indicates that the draining operation is complete.Launch the new job with the updated code.This simple, flexible, and powerful system allows developers to focus on their data transformation code without having to be concerned about job orchestration or the underlying infrastructure details.Looking aheadSix months after fully transitioning its ad analytics data platform to Google Cloud, Twitter has already seen huge benefits. Twitter’s developers have gained in agility as they can more easily configure existing data pipelines and build new features much faster. The real-time data pipeline has also greatly improved its reliability and accuracy, thanks to Beam’s exactly-once semantics and the increased processing speed and ingestion capacity enabled by Pub/Sub, Dataflow, and Bigtable.Twitter engineers have enjoyed working with Dataflow and Beam for several years now, since version 2.2, and plan to continue expanding their usage. Most importantly, they’ll soon merge the batch and streaming layers into a single, authoritative streaming layer.Throughout this project, the Twitter team collaborated very closely with Google engineers to exchange feedback and discuss product enhancements. We look forward to continuing this joint technical effort on several ongoing large-scale cloud migration projects at Twitter. Stay tuned for more updates!
Quelle: Google Cloud Platform

18. März 2020

da Agency

Not just for HTTP anymore: gRPC comes to Cloud Run

Cloud Run is a managed serverless compute offering from Google Cloud that lets you run stateless server containers in a fully managed environment, without the hassle of managing the underlying infrastructure. Since its release, Cloud Run has enabled many of our customers to focus on their business logic, while leaving the provisioning, configuring, and scaling to us. Most applications that run inside Cloud Run use HTTP JSON REST to serve requests, but that’s not the only protocol it supports; in September, it also started to support unary gRPC services.gRPC is a high performance RPC framework developed by Google and used extensively for traditional workloads and at the edge by companies like Netflix, Cisco, Square, and others. While gRPC offers advantages over traditional HTTP, like strong interface definitions and code generation, setting up the infrastructure to run a gRPC server in production can be a real chore. Cloud Run takes the toil out of this process.With gRPC, you start with a strong API contract in the form of a protocol buffer file:This interface definition ensures that your clients and servers speak the same language even as you extend the capabilities of your service. You then generate code from this definition in your desired language and provide an implementation for it:Cloud Run provides everything else you need to get your code serving traffic. You just need to put together a simple Dockerfile and run a few commands:We’ve put together additional examples in several languages to help you get started running a simple gRPC service in fully managed Cloud Run. We’re excited to see the gRPC services you’ll deploy!Support for gRPC in Cloud Run is evolving. For example, we’re still working on support for streaming. For use cases where you want to send data incrementally from client to server, you’re currently better off chaining together a series of unary RPCs or REST requests. Additionally, Cloud Run’s gRPC data path currently works best for small requests. As a rule of thumb, you should keep your requests below 32MB in size. We plan to improve this over time, but for now you can learn more about gRPC on Cloud Run with this tutorial.
Quelle: Google Cloud Platform