This Is How Google Wants To Make The Internet Speak Everyone’s Language

Nurhaida Sirait, a grandmother that speaks the native Batak language and uses Facebook on her smartphone to connect to friends and family, poses for a portrait.

Andri Tambunan for BuzzFeed News

JAKARTA, Indonesia — When Nurhaida Sirait-Go curses, she curses in her mother tongue.

The 60-year-old grandmother does everything emphatically, and Bahasa, the official language of Indonesia, just doesn’t allow for the same fury of swearing as Bakat, the language that Sirait-Go grew up speaking on the Indonesian island of Sumatra.

“On Facebook, on Whatsapp, they speak only Bahasa. So I can’t speak the way I want,” said Sirait-Go, who giggles uncontrollably and covers her mouth with both hands when asked to repeat one of her favorite curse words in Bakat. “I can’t, I can’t&; People don’t use these words anymore. … They aren’t on the internet so they don’t exist.”

Bakat is one of over 700 languages spoken in Indonesia. But only one language, Bahasa, is currently taught by public schools and widely-used online. For language preservationists, it’s just one more example of how the internet’s growing global influence is leaving some languages in the dust. Linguists warn that 90% of the world&;s approximately 7,000 languages will become extinct in the next 100 years. Or, as one prominent group of linguists ominously put it, every 14 days another language goes extinct.

The trend that started hundreds of years ago, as the idea of a “nation state” took hold globally, with governments realizing that a standardized language would help them stand out as a nation state and solidify an identity inside their borders. That process, which sped up as languages like French and English became dominant languages among traders and then diplomats, went into overdrive as the internet’s sweeping reach has encouraged users to engage in the language with the highest common denominator.

Linne Ha, a program manager at Google who focuses on low resource languages, estimates that there are at least thirty languages with one million speakers each that are currently not supported online — and there are many many more with less than a million speakers. If you were to imagine all those people as one group, it would be a country roughly the size of the United States which couldn’t type online, let alone use the text-to-speech function that make things like Google Maps reading you your directions as you drive possible.

“We are biased because all of the equipment is designed for us,” Ha told BuzzFeed News. “The first thing, the default, is an English language keyboard, but what if your language doesn’t use those characters, or what if your language is only spoken, but not written?”

According to the UN, roughly 500 languages are used online, though popular sites like Facebook and Twitter support just 80 and 28 respectively. Those sites also display their domain names, or URLs, in Latin letters — for millions of people around the world, the letters www.facebook.com are nothing more than a string of shapes to be remembered or copy/pasted into an address bar. The internet, largely in English, does not feel as though it was built to speak their language.

Facebook profile page of Nurhaida Sirait, a grandmother that speaks the native Batak language and uses Facebook on her smartphone to connect to friends and family.

Andri Tambunan for BuzzFeed News

Ha worries about whether or not the internet is harming the world’s diversity of languages. She has been working Google for ten years, the last two of which she has had the unique job title of “voice hunter” for Google’s Project Unison. Getting a language online means everything from developing a font, which can cost upwards of $30,000 to design and code, to recording and creating voice capabilities for the language that power programs like Google Maps. It’s the voice part that Ha is focused on. As many parts of the world come online which use spoken, rather than written languages, it’s become more important than ever to be able to use speak functions on the internet.

“In much of the world the phone, specifically using your voice commands on the phone, that is the standard way to communicate,” said Ha. “These are places where there is more of an oral tradition than a written one.”

The Wu language, spoken by roughly 80 million people in the Shanghai region of China, is a prime example. Spoken Wu has many characters that cannot be written with standard Chinese characters, and the language is rarely written as schools only teach students to read and write in Mandarin. For Wu speakers to be fully immersed in using and conversing on the internet, a function must be created for them to be able to speak, and hear, their language online.

Others, she said, were simply not easy to adapt to the average keyboard. The Khmer language, which is spoken by 18 million people in Cambodia, includes 33 consonants, 23 vowels, and 12 independent vowels.

“On the type of keyboards you get on your phone they have to click and go through three sets of keyboards to type in one word. It’s cumbersome,” said Ha. The solution, Ha says, is what’s called a “transliteration keyboard,” where spoken words take the place of a traditional keyboard.

“Previously, in order to create a voice, a speech synthesis voice, you would need to record really good acoustic data, and have all the different sounds of a language,” said Ha. That required bringing in a “voice talent”, or a local with what Ha calls the perfect voice, a voice that any native person of that language would find pleasant and easy to understand. They were joined by a project manager and 3-4 people in a recording studio. The process, Ha said, would take six months or more to record all the necessary sounds which make up a language. “It was really really expensive.”

Ha, however, helped develop a way to use machine learning, otherwise known as artificial intelligence (AI), to bring a new language online in a matter of days. The new process takes advantage of what’s known as a “neural network,” a type of AI that tries to emulate the way a human brain works. Like a toddler learning which foods it likes and doesn’t like, the system works through trial and error, rewriting itself through patterns in the data it is given.

Ha said she got the idea for how to streamline the process one day while watching Saturday Night Live. “When I was watching SNL I saw all these comedians mimicking politicians. I thought that was interesting, one person pretending to be different people,” said Ha. A handful of voices, she realized, when sent through a system capable of analyzing them and recognizing patterns, could be enough to create a complete language database.

She began with a team of 50 Bengali speakers in Google’s headquarters in Mountain View, California. Ha’s team built a web app which could run on a ventless laptop (fans would distort the recording) and recorded the voices of the Bengali Google employees. She then ran a survey asking the group which voice they liked the best; once she had a reference voice, she then looked for voices that had a similar cadence.

“These volunteers, we didn’t want them to get tired. We had them speak in 45 minute increments, roughly 145 sentences. So in three days we got 2000 sentences,” said Ha. The system then built patterns out of the words and expanded the vocabulary. “With that we were able to build a model. It took three days to build a book of the Bengali voice.”

“The voice we created is a blend of seven voices. It’s like a choir”

Ha then built a portable recording booth, small enough to fit in a carry on, which she has now used to travel around the world. So far, she’s used it to bring three new languages online — Bengali, Khmer, and Sinhala — in the course of the last year.

“The voice we created is a blend of seven voices. It’s like a choir,” said Ha, reflecting on the finished voices they have presented to the public. Earlier this year she visited Indonesia, where she partnered with a local university and is working on bringing two more languages spoken in Indonesia, Javanese and Sundanese, online.

In Jakarta, Sirait-Go was “thrilled” to hear that Google was working to bring more languages online, though she was less impressed to hear their pilot program in that country had been with Javanese, rather than her native tongue of Batak.

“It would be much better for everyone if they could speak in Batak, they could express themselves better,” said Sirait-Go.

When asked about what she communicates online, she runs to the next room to bring back a pristine Samsung Galaxy phone her daughter bought her in May of this year. She keeps it in a separate room, on a shelf of its own, whenever she’s not using it.

“My kids tell me to use the internet, to not be old fashioned, but I don’t know what to do there,” said Sirait-Go, who recently welcomed her fifth grandchild. She opens her phone to show her 168 friends on Facebook (she has an additional 55 friend requests but isn’t sure how to answer them). Her Facebook page is largely made up of photos of Sumatra, particularly of Lake Toba, where she grew up.

“I have a video of the lake too&033; Someone is speaking in the video in Batak and that makes me happy to hear,” said Sirait-Go. Her daughters and grandchildren, she said, only use Batak when they are making fun of her.

“I don’t think my grandchildren or great grandchildren will learn Batak and that makes me sad,” she said. “If they cannot speak it on the internet they will not learn it.”

Quelle: <a href="This Is How Google Wants To Make The Internet Speak Everyone’s Language“>BuzzFeed

2016: The Year We Stopped Listening To Big Tech’s Favorite Excuse

In early December, Facebook published a blog post summing up the company’s breakthroughs and challenges in image and speech recognition. Halfway down the page in a section explaining how Facebook’s computers are “quickly getting better” at identifying the objects in pictures and videos, the company embedded an animated GIF showing off its AI analysis of a photograph taken at a peaceful Black Lives Matter protest.

It was an odd choice of illustration for a blog post touting Facebook’s machine learning advancements. Just days before, rumors had begun circulating that authorities had been using Facebook to identify Dakota Access pipeline protesters in North Dakota. And a few weeks prior to that, the ACLU had released a report revealing that the company&;s API had been used in 2014 to track protesters in Ferguson.

The GIF circulated on Twitter as an example of unsettling, tone-deaf PR from one of the world’s most powerful tech companies. A few hours after I tweeted that the image was “unnerving,” a Facebook product manager and business lead to the CTO contacted me, somewhat bewildered. “Curious to know why you think so. It was a frequently shared and meaningful image from this year that AI fails to interpret,” he replied. A few minutes later, after concerned tweets from others piled up, he wrote back again, “based on this feedback I think we didn&039;t put enough of that context into the post. Appreciate feedback.” The image was removed an hour later.

That Facebook failed to see how such an emotionally charged image might trigger deeply held anxieties about the social network’s power and influence was telling. But that the company’s users objected loudly enough to force a correction highlighted a fundamental shift in how tech’s biggest companies are held to account this year.

For years, Silicon Valley’s biggest platforms have thrown their collective hands in the air amid controversies and declared, “We’re just a technology company.” This excuse, along with “We’re only the platform” is a handy absolution for the unexpected consequences of their creations. Facebook used the excuse to shrug off fake news concerns. Airbnb invoked it to downplay reports of racial discrimination on its platform. Twitter hid behind platform neutrality for years even as it was overrun with racist and sexist trolls. Uber even used the tech company argument in a European court to avoid having to comply with national transportation laws.

But in 2016, Big Tech’s well-practiced excuse became less effective. The idea that their enormous and deeply influential platforms are merely a morally and politically neutral piece of the internet’s infrastructure — much like an ISP or a set of phone lines — that should remain open, free, and unmediated simply no longer makes ethical or logical sense.

In 2016, more than any year before it, our world was shaped by the internet. It’s where Donald Trump subverted the media and controlled the news cycle. Where minorities, activists, and politicians from both sides of the aisle protested Trump&039;s candidacy daily. And where emergent, swarming online hate groups (including but not limited to the so-called alt-right) developed a loud counterculture to combat liberalism. Startups like Uber and Airbnb didn&039;t just help us navigate the physical world, but were revealed as unwitting vectors of bigotry and misogyny. This year, the internet and its attendant controversies and intractable problems weren’t just a sideshow, but a direct reflection of who we are, and so the decisions made by the companies and platforms that rank among the web’s most prominent businesses became harder to ignore.

A leaked Facebook internal post obtained by Gizmodo about Facebook&039;s responsibility to prevent a Trump presidency.

Gizmodo / Via gizmodo.com

This spring, Facebook dismissed the notion that it has any institutional biases when Gizmodo published leaked internal communications that suggested employees were floating ways in which the platform could be used to stop Trump’s bid to the White House. Similarly, when Gizmodo reported that the company’s Trending Topics team suppressed conservative news, the company denounced the actions and fired the team: Such bias, Facebook said, was unacceptable for a pure technology company where engineers build agnostic tools and blind platforms with the simple desire to connect the world.

And post-election, in response to claims that it allowed political misinformation to spread unchecked, Facebook argued that it was not a media company but a technology company. No matter that it pulled in more than $6 billion in advertising revenue in just the second quarter of 2016. Facebook claimed it was a “crazy idea” that the very same platform that has unmatched influence over its billion users’ spending habits also had influence over those same users’ political decisions. (The company has since walked back its excuse and has begun to find ways to partner with fact checkers and even flag demonstrably false news and misinformation on the platform. A week ago, Zuckerberg changed his definitions, calling Facebook “a new kind of platform.” He argued that it was “not a traditional technology company. It’s not a traditional media company. You know, we build technology and we feel responsible for how it’s used.”)

Also in 2016, Facebook rolled out a live video tool that gave nearly 2 billion people the ability to broadcast from their phones in real time. Live gave us an exploding watermelon and Star Wars Mom, but it also gave us the last minutes of Philando Castile’s life and the ensuing protests. Just as the Castile post started to go viral, it vanished from the network. It was restored, but not before raising urgent questions as to how Facebook would or wouldn&039;t censor newsworthy content (many of which went unanswered). Facebook bet big on building the technology to become the internet’s primary destination for live video but appeared unwilling to reckon with its power to bear witness to the worst that the world has to offer. It blamed the Castile incident on a technical glitch.

Both Twitter and Reddit repeatedly suggested that they are global town squares and open public forums and thus ought not to be moderated except in extreme cases. Like Facebook, they refused to see themselves as media companies or publishing platforms, despite being powerful tools for news, publishing, and politicians (this year Twitter reclassified itself in the Apple App Store as “news” instead of “social networking”). And then they watched as their platforms were overrun with trolls. Tools for free speech were used by nefarious actors to suppress the speech of others while little was done by the companies for fear of creating precedent for aggressive censorship. Again, this isn’t new: For the last decade, the crash of utopianism against the rocks of human reality has arguably been the defining story of the internet.

But in 2016, the consequences of these missteps became realer. Jewish journalists saw their pictures photoshopped into gas chambers and circulated around Twitter and across the internet. A Reddit community (r/The_Donald) dedicated to Donald Trump’s candidacy allegedly harassed other communities and led a campaign to take over the front page of the site — one of the biggest on the internet. Donald Trump rewarded them by appearing on the site for an “Ask Me Anything” Q&A. Trolls waged misinformation campaigns to try to disenfranchise black and Latino voters supporting Hillary Clinton. Twitter was a free megaphone for the now-president-elect to attack the press, disseminate misinformation, and even target private citizens who challenged him, each of his tweets setting off a wave of targeted hate, threats, and abuse toward their subjects.

ADL

But users and observers fought back. The Anti-Defamation League assembled a Twitter harassment task force to combat the rise of anti-Semitism on the platform. Leslie Jones responded to her targeted harassment by very publicly quitting Twitter, which led to the permanent suspension of one of its master trolls. Former employees spoke out against Twitter’s decade-long struggle to protect its users from abuse. CEO Jack Dorsey faced pressure from journalists and advocates for not making abuse prevention a priority. Reddit has begun steps to keep r/The_Donald from overwhelming other communities on the site. Twitter rolled out a set of new abuse tools and internal user support practices. It began a series of crackdowns on alt-right trolls, and it publicly vowed to stay vigilant. Enforcement remains inconsistent and opaque, but the company now operates under the watchful scrutiny of journalists and loud and critical users.

It’s not just the online platforms. Startups like Uber and Airbnb, which are powered by tech but operate almost exclusively in the physical world, drew ire for invoking the “tech company” excuse. This year Uber argued in European court that it is a digital platform, not a taxi or transportation company. It argued this despite its very public ambitions to reshape cities and change the nature of car ownership. It argued this despite the fact that it now builds autonomous vehicles that move real people on real city streets and despite the fact that it is arguably the largest dispatch transportation company in the world, with vehicles in over 300 cities and six continents and an estimated valuation of around $68 billion. It argued that it is just a technology company despite the fact that downloading and hailing and stepping into a cab brings with it far more visceral — and potentially serious — risks than that of a simple digital platform.

Uber’s argument largely fell flat in 2016. In Europe, the company faces lawsuits from taxi associations and protests from drivers for undermining transportation companies across the continent. Continuing reports of sexual assault and driver misconduct led to lawsuits and proposed legislation and transparency from governments in places like New York City. Just this month, Uber’s self-driving technology was pulled off streets in San Francisco by the DMV for being deployed too early.

After initial reports of racial discrimination from people using its home rental platform, Airbnb proffered a flaccid defense. “We prohibit content that promotes discrimination, bigotry, racism, hatred, harassment or harm against any individual or group,” the company said in May. But as reports of racial profiling on Airbnb continued to surface, the company was forced to address the issue in earnest. In a moment of candor, co-founder Brian Chesky suggested that the company’s creators hadn’t anticipated the potential for abuse. “We’re also realizing when we designed the platform, Joe, Nate, and I, three white guys, there’s a lot of things we didn’t think about when we designed this platform. And so there’s a lot of steps that we need to re-evaluate,” he said in July.

In some ways, Chesky’s comments about the unintended consequences of platform design speak to the frustration we, the users, feel when we’re faced with the “We’re just a technology company” excuse. The unspoken corollary to this argument seems to be “Hey, we&039;re just a platform, we&039;re not responsible, nor could we ever be liable for the design choices that guide and enable our users.”

But as we saw this year, that couldn’t be further from the truth. Facebook’s not just the place where you go to play Farmville and like pictures of your friend’s babies — it’s a filter-bubbled window through which more than a billion people view the world. Twitter isn’t a global town square or park, it’s the world’s most important newswire and, for some, a wildly effective way to quickly communicate with a massive audience. Uber isn’t an app, it’s a global transportation company that can, and in fact intends to, forever reshape the way humans get from point A to point B. Airbnb isn’t a vacation rental site, it’s a new vision of home ownership and travel accommodations.

For years, Silicon Valley’s biggest companies have been telling us they plan to reshape our lives online and off. But 2016 was the year that we really started taking those claims seriously. And now, in a world where Donald Trump can ascend to the highest office buoyed by fake news and 5 a.m. tweetstorms, and platforms like Uber and Airbnb have shown themselves vulnerable to the whims of some prejudiced users, there’s an emerging expectation of accountability for the platforms that are reshaping our world daily.

In other words, trotting out the “But we’re just a digital platform” excuse as a quick and easy abdication of responsibility for the perhaps unforeseen — but maybe also inevitable — consequences of Big Tech&039;s various creations is fast becoming a nonstarter. Until recently, Facebook’s unofficial engineering motto was “Move fast and break things” — a reference to tech’s once-guiding ethos of being more nimble than the establishment. “Move fast and break things” works great with code and software, but 2016’s enduring lesson for tech has proven that when it comes to the internet’s most powerful, ubiquitous platforms, this kind of thinking isn’t just logically fraught, it’s dangerous — particularly when real human beings and the public interest are along for the ride.

Quelle: <a href="2016: The Year We Stopped Listening To Big Tech’s Favorite Excuse“>BuzzFeed

New Year’s Resolution: Learn Docker

Remember last year when I said the market for Docker jobs was blowing up? Well, it’s more than doubled in the last year. And Swarm is also rising quickly, growing 12829%, almost all of that in the last year. We expect that with our partnership with Microsoft and Windows Docker containers, that this will grow even faster in the next year as .NET developers start to containerize their applications and Windows IT Professionals start porting their infrastructure to Docker. Take a look at this trendline from indeed.com.

So what are you doing to increase your Docker skills? Want a few suggestions?
Whether you’re a developer or more an ops person, a great place to start is the Docker Labs repository, which has currently 28 labs for you to choose from. They range from beginner tutorials, to orchestration workshops, security and networking tutorials, and guides for using different programming languages and developer tools.
Of course there’s also the Docker Documentation, which has a rich set of resources.
At Dockercon 2017 in April, there will be rich set of material for beginners and experts alike, and you will get to meet people from all over the world who are using Docker in their daily lives. Here are just a few things attendees can do at DockerCon:

Learn about Docker from getting started to deep dives into Docker internals from Docker Captains
Take hands-on, self-paced labs that give you practical skills
Learn about the ecosystem of companies that build on Docker in our Expo Hall.
And if you are really passionate about Docker, our recruiting team will have a booth there too, so check out our careers page

You can also take a training course. We have instructor lead trainings all over the world, or you can do a self-paced course.
Or connect with the Docker Community by attending a Docker Event including meetups and webinars. There’s also a Docker Community list you can join that will give you access to a Docker Slack Channel, where you can go for support and discussion.

Looking for a new job, learning @docker is a good way to get one To Tweet

The post New Year&;s Resolution: Learn Docker appeared first on Docker Blog.
Quelle: https://blog.docker.com/feed/