Subscribe: Google Research Blog
http://googleresearch.blogspot.com/atom.xml
Added By: Feedage Forager Feedage Grade A rated
Language: English
Tags:
data  dataset  google  images  learning  machine learning  machine  model  models  neural  open  posted  research  training 
Rate this Feed
Rate this feedRate this feedRate this feedRate this feedRate this feed
Rate this feed 1 starRate this feed 2 starRate this feed 3 starRate this feed 4 starRate this feed 5 star

Comments (0)

Feed Details and Statistics Feed Statistics
Preview: Google Research Blog

Google Research Blog



The latest news on Google Research.



Updated: 2017-10-18T13:44:48.942-07:00

 



Portrait mode on the Pixel 2 and Pixel 2 XL smartphones

2017-10-17T17:41:30.394-07:00

Posted by Marc Levoy, Principal Engineer and Yael Pritch, Software EngineerPortrait mode, a major feature of the new Pixel 2 and Pixel 2 XL smartphones, allows anyone to take professional-looking shallow depth-of-field images. This feature helped both devices earn DxO's highest mobile camera ranking, and works with both the rear-facing and front-facing cameras, even though neither is dual-camera (normally required to obtain this effect). Today we discuss the machine learning and computational photography techniques behind this feature.HDR+ picture without (left) and with (right) portrait mode. Note how portrait mode’s synthetic shallow depth of field helps suppress the cluttered background and focus attention on the main subject. Click on these links in the caption to see full resolution versions. Photo by Matt JonesWhat is a shallow depth-of-field image?A single-lens reflex (SLR) camera with a big lens has a shallow depth of field, meaning that objects at one distance from the camera are sharp, while objects in front of or behind that "in-focus plane" are blurry. Shallow depth of field is a good way to draw the viewer's attention to a subject, or to suppress a cluttered background. Shallow depth of field is what gives portraits captured using SLRs their characteristic artistic look.The amount of blur in a shallow depth-of-field image depends on depth; the farther objects are from the in-focus plane, the blurrier they appear. The amount of blur also depends on the size of the lens opening. A 50mm lens with an f/2.0 aperture has an opening 50mm/2 = 25mm in diameter. With such a lens, objects that are even a few inches away from the in-focus plane will appear soft.One other parameter worth knowing about depth of field is the shape taken on by blurred points of light. This shape is called bokeh, and it depends on the physical structure of the lens's aperture. Is the bokeh circular? Or is it a hexagon, due to the six metal leaves that form the aperture inside some lenses? Photographers debate tirelessly about what constitutes good or bad bokeh.Synthetic shallow depth of field imagesUnlike SLR cameras, mobile phone cameras have a small, fixed-size aperture, which produces pictures with everything more or less in focus. But if we knew the distance from the camera to points in the scene, we could replace each pixel in the picture with a blur. This blur would be an average of the pixel's color with its neighbors, where the amount of blur depends on distance of that scene point from the in-focus plane. We could also control the shape of this blur, meaning the bokeh.How can a cell phone estimate the distance to every point in the scene? The most common method is to place two cameras close to one another – so-called dual-camera phones. Then, for each patch in the left camera's image, we look for a matching patch in the right camera's image. The position in the two images where this match is found gives the depth of that scene feature through a process of triangulation. This search for matching features is called a stereo algorithm, and it works pretty much the same way our two eyes do.A simpler version of this idea, used by some single-camera smartphone apps, involves separating the image into two layers – pixels that are part of the foreground (typically a person) and pixels that are part of the background. This separation, sometimes called semantic segmentation, lets you blur the background, but it has no notion of depth, so it can't tell you how much to blur it. Also, if there is an object in front of the person, i.e. very close to the camera, it won't be blurred out, even though a real camera would do this.Whether done using stereo or segmentation, artificially blurring pixels that belong to the background is called synthetic shallow depth of field or synthetic background defocusing. Synthetic defocus is not the same as the optical blur you would get from an SLR, but it looks similar to most people.How portrait mode works on the Pixel 2The Google Pixel 2 offers portrait mode on both its rear-facing and front-facing camera[...]



TensorFlow Lattice: Flexibility Empowered by Prior Knowledge

2017-10-11T10:00:00.156-07:00

Posted by Maya Gupta, Research Scientist, Jan Pfeifer, Software Engineer and Seungil You, Software Engineer (Cross-posted on the Google Open Source Blog)Machine learning has made huge advances in many applications including natural language processing, computer vision and recommendation systems by capturing complex input/output relationships using highly flexible models. However, a remaining challenge is problems with semantically meaningful inputs that obey known global relationships, like “the estimated time to drive a road goes up if traffic is heavier, and all else is the same.” Flexible models like DNNs and random forests may not learn these relationships, and then may fail to generalize well to examples drawn from a different sampling distribution than the examples the model was trained on.Today we present TensorFlow Lattice, a set of prebuilt TensorFlow Estimators that are easy to use, and TensorFlow operators to build your own lattice models. Lattices are multi-dimensional interpolated look-up tables (for more details, see [1--5]), similar to the look-up tables in the back of a geometry textbook that approximate a sine function. We take advantage of the look-up table’s structure, which can be keyed by multiple inputs to approximate an arbitrarily flexible relationship, to satisfy monotonic relationships that you specify in order to generalize better. That is, the look-up table values are trained to minimize the loss on the training examples, but in addition, adjacent values in the look-up table are constrained to increase along given directions of the input space, which makes the model outputs increase in those directions. Importantly, because they interpolate between the look-up table values, the lattice models are smooth and the predictions are bounded, which helps to avoid spurious large or small predictions in the testing time. allowfullscreen="" class="YOUTUBE-iframe-video" data-thumbnail-src="https://i.ytimg.com/vi/kaPheQxIsPY/0.jpg" frameborder="0" height="360" src="https://www.youtube.com/embed/kaPheQxIsPY?rel=0&feature=player_embedded" width="640">How Lattice Models Help YouSuppose you are designing a system to recommend nearby coffee shops to a user. You would like the model to learn, “if two cafes are the same, prefer the closer one.” Below we show a flexible model (pink) that accurately fits some training data for users in Tokyo (purple), where there are many coffee shops nearby. The pink flexible model overfits the noisy training examples, and misses the overall trend that a closer cafe is better. If you used this pink model to rank test examples from Texas (blue), where businesses are spread farther out, you would find it acted strangely, sometimes preferring farther cafes! Slice through a model’s feature space where all the other inputs stay the same and only distance changes. A flexible function (pink) that is accurate on training examples from Tokyo (purple) predicts that a cafe 10km-away is better than the same cafe if it was 5km-away. This problem becomes more evident at test-time if the data distribution has shifted, as shown here with blue examples from Texas where cafes are spread out more.A monotonic flexible function (green) is both accurate on training examples and can generalize for Texas examples compared to non-monotonic flexible function (pink) from the previous figure.In contrast, a lattice model, trained over the same example from Tokyo, can be constrained to satisfy such a monotonic relationship and result in a monotonic flexible function (green). The green line also accurately fits the Tokyo training examples, but also generalizes well to Texas, never preferring farther cafes.In general, you might have many inputs about each cafe, e.g., coffee quality, price, etc. Flexible models have a hard time capturing global relationships of the form, “if all other inputs are equal, nearer is better, ” especially in parts of the feature space where your training data is sparse and noisy. Machine learning models that capture prior knowled[...]



The Google Brain Team’s Approach to Research

2017-09-13T08:42:22.830-07:00

Posted by Jeff Dean, Google Senior FellowAbout a year ago, the Google Brain team first shared our mission “Make machines intelligent. Improve people’s lives.” In that time, we’ve shared updates on our work to infuse machine learning across Google products that hundreds of millions of users access everyday, including Translate, Maps, and more. Today, I’d like to share more about how we approach this mission both through advancement in the fundamental theory and understanding of machine learning, and through research in the service of product.Five years ago, our colleagues Alfred Spector, Peter Norvig, and Slav Petrov published a blog post and paper explaining Google’s hybrid approach to research, an approach that always allowed for varied balances between curiosity-driven and application-driven research. The biggest challenges in machine learning that the Brain team is focused on require the broadest exploration of new ideas, which is why our researchers set their own agendas with much of our team focusing specifically on advancing the state-of-the-art in machine learning. In doing so, we have published hundreds of papers over the last several years in conferences such as NIPS, ICML and ICLR, with acceptance rates significantly above conference averages.Critical to achieving our mission is contributing new and fundamental research in machine learning. To that end, we’ve built a thriving team that conducts long-term, open research to advance science. In pursuing research across fields such as visual and auditory perception, natural language understanding, art and music generation, and systems architecture and algorithms, we regularly collaborate with researchers at external institutions, with fully 1/3rd of our papers in 2017 having one or more cross-institutional authors. Additionally, we host collaborators from academic institutions to enhance our own work and strengthen our connection to the external scientific community.We also believe in the importance of clear and understandable explanations of the concepts in modern machine learning. Distill.pub is an online technical journal providing a forum for this purpose, launched by Brain team members Chris Olah and Shan Carter. TensorFlow Playground is an in-browser experimental venue created by the Google Brain team’s visualization experts to give people insight into how neural networks behave on simple problems, and PAIR’s deeplearn is an open source WebGL-accelerated JavaScript library for machine learning that runs entirely in your browser, with no installations and no backend.In addition to working with the best minds in academia and industry, the Brain team, like many other teams at Google, believes in fostering the development of the next generation of scientists. Our team hosts more than 50 interns every year, with the goal of publishing their work in top machine learning venues (roughly 25% of our group’s publications so far in 2017 have intern co-authors, usually as primary authors). Additionally, in 2016, we welcomed the first cohort of the Google Brain Residency Program, a one-year program for people who want to learn to do machine learning research. In its inaugural year, 27 residents conducted research alongside and under the mentorship of Brain team members, and authored more than 40 papers that were accepted in top research conferences. Our second group of 36 residents started their one-year residency in our group in July, and are already involved in a wide variety of projects.Along with other teams within Google Research, we enjoy the freedom to both contribute fundamental advances in machine learning, and separately conduct product-focused research. Both paths are important in ensuring that advances in machine learning have a significant impact on the world. [...]



Highlights from the Annual Google PhD Fellowship Summit, and Announcing the 2017 Google PhD Fellows

2017-09-12T10:04:02.680-07:00

Posted by Susie Kim, Program Manager, University RelationsIn 2009, Google created the PhD Fellowship Program to recognize and support outstanding graduate students doing exceptional research in Computer Science and related disciplines. Now in its ninth year, our Fellowships have helped support over 300 graduate students in Australia, China and East Asia, India, North America, Europe and the Middle East who seek to shape and influence the future of technology.Recently, Google PhD Fellows from around the globe converged on our Mountain View campus for the second annual Global PhD Fellowship Summit. VP of Education and University Programs Maggie Johnson welcomed the Fellows and went over Google's approach to research and its impact across our products and services. The students heard talks from researchers like Ed Chi, Douglas Eck, Úlfar Erlingsson, Dina Papagiannaki, Viren Jain, Ian Goodfellow, Kevin Murphy and Galen Andrew, and got a glimpse into some of the state-of-the-art research pursued across Google. Google Fellows attending the 2017 Global PhD Fellowship SummitThe event included a panel discussion with Domagoj Babic, Kathryn McKinley, Nina Taft, Roy Want and Sunny Colsalvo about their unique career paths in academia and industry. Fellows also had the chance to connect one-on-one with Googlers to discuss their research, as well as receive feedback from leaders in their fields in smaller deep dives and a poster event.Fellows share their work with Google researchers during the poster sessionOur PhD Fellows represent some the best and brightest young researchers around the globe in Computer Science and it is our ongoing goal to support them as they make their mark on the world.We’d additionally like to announce the complete list of our 2017 Google PhD Fellows, including the latest recipients from China and East Asia, India, and Australia. We look forward to seeing each of them at next year’s summit!2017 Google PhD FellowsAlgorithms, Optimizations and MarketsChiu Wai Sam Wong, University of California, BerkeleyEric Balkanski, Harvard UniversityHaifeng Xu, University of Southern CaliforniaHuman-Computer InteractionMotahhare Eslami, University of Illinois, Urbana-ChampaignSarah D'Angelo, Northwestern UniversitySarah Mcroberts, University of Minnesota - Twin CitiesSarah Webber, The University of MelbourneMachine LearningAude Genevay, Fondation Sciences Mathématiques de ParisDustin Tran, Columbia UniversityJamie Hayes, University College LondonJin-Hwa Kim, Seoul National UniversityLing Luo, The University of SydneyMartin Arjovsky, New York UniversitySayak Ray Chowdhury, Indian Institute of ScienceSong Zuo, Tsinghua UniversityTaco Cohen, University of AmsterdamYuhuai Wu, University of TorontoYunhe Wang, Peking UniversityYunye Gong, Cornell UniversityMachine Perception, Speech Technology and Computer VisionAvijit Dasgupta, International Institute of Information Technology - HyderabadFranziska Müller, Saarland University - Saarbrücken GSCS and Max Planck Institute for InformaticsGeorge Trigeorgis, Imperial College LondonIro Armeni, Stanford UniversitySaining Xie, University of California, San DiegoYu-Chuan Su, University of Texas, AustinMobile ComputingSangeun Oh, Korea Advanced Institute of Science and TechnologyShuo Yang, Shanghai Jiao Tong UniversityNatural Language ProcessingBidisha Samanta, Indian Institute of Technology KharagpurEkaterina Vylomova, The University of MelbourneJianpeng Cheng, The University of EdinburghKevin Clark, Stanford UniversityMeng Zhang, Tsinghua UniversityPreksha Nama, Indian Institute of Technology MadrasTim Rocktaschel, University College LondonPrivacy and SecurityRomain Gay, ENS - École Normale SupérieureXi He, Duke UniversityYupeng Zhang, University of Maryland, College ParkProgramming Languages, Algorithms and Software EngineeringChristoffer Quist Adamsen, Aarhus UniversityMuhammad Ali Gulzar, University of California, Los AngelesOded Padon, Tel-Aviv UniversityStructured Data and Database ManagementAm[...]



Build your own Machine Learning Visualizations with the new TensorBoard API

2017-09-11T12:21:50.311-07:00

Posted by Chi Zeng and Justine Tunney, Software Engineers, Google Brain TeamWhen we open-sourced TensorFlow in 2015, it included TensorBoard, a suite of visualizations for inspecting and understanding your TensorFlow models and runs. Tensorboard included a small, predetermined set of visualizations that are generic and applicable to nearly all deep learning applications such as observing how loss changes over time or exploring clusters in high-dimensional spaces. However, in the absence of reusable APIs, adding new visualizations to TensorBoard was prohibitively difficult for anyone outside of the TensorFlow team, leaving out a long tail of potentially creative, beautiful and useful visualizations that could be built by the research community.To allow the creation of new and useful visualizations, we announce the release of a consistent set of APIs that allows developers to add custom visualization plugins to TensorBoard. We hope that developers use this API to extend TensorBoard and ensure that it covers a wider variety of use cases.We have updated the existing dashboards (tabs) in TensorBoard to use the new API, so they serve as examples for plugin creators. For the current listing of plugins included within TensorBoard, you can explore the tensorboard/plugins directory on GitHub. For instance, observe the new plugin that generates precision-recall curves:The plugin demonstrates the 3 parts of a standard TensorBoard plugin:A TensorFlow summary op used to collect data for later visualization. [GitHub]A Python backend that serves custom data. [GitHub]A dashboard within TensorBoard built with TypeScript and polymer. [GitHub]Additionally, like other plugins, the “pr_curves” plugin provides a demo that (1) users can look over in order to learn how to use the plugin and (2) the plugin author can use to generate example data during development. To further clarify how plugins work, we’ve also created a barebones TensorBoard “Greeter” plugin. This simple plugin collects greetings (simple strings preceded by “Hello, ”) during model runs and displays them. We recommend starting by exploring (or forking) the Greeter plugin as well as other existing plugins.A notable example of how contributors are already using the TensorBoard API is Beholder, which was recently created by Chris Anderson while working on his master’s degree. Beholder shows a live video feed of data (e.g. gradients and convolution filters) as a model trains. You can watch the demo video here. We look forward to seeing what innovations will come out of the community. If you plan to contribute a plugin to TensorBoard’s repository, you should get in touch with us first through the issue tracker with your idea so that we can help out and possibly guide you.AcknowledgementsDandelion Mané and William Chargin played crucial roles in building this API. [...]



Seminal Ideas from 2007

2017-09-06T10:00:21.684-07:00

Posted by Anna Ukhanova, Technical Program Manager, Google Research EuropeIt is not everyday we have the chance to pause and think about how previous work has led to current successes, how it influenced other advances and reinterpret it in today’s context. That’s what the ICML Test-of-Time Award is meant to achieve, and this year it was given to the work Sylvain Gelly, now a researcher on the Google Brain team in our Zurich office, and David Silver, now at DeepMind and lead researcher on AlphaGo, for their 2007 paper Combining Online and Offline Knowledge in UCT. This paper presented new approaches to incorporate knowledge, learned offline or created online on the fly, into a search algorithm to augment its effectiveness.The Game of Go is an ancient Chinese board game, which has tremendous popularity with millions of players worldwide. Since the success of Deep Blue in the game of Chess in the late 90’s, Go has been considered as the next benchmark for machine learning and games. Indeed, it has simple rules, can be efficiently simulated, and progress can be measured objectively. However, due to the vast search space of possible moves, making an ML system capable of playing Go well represented a considerable challenge. Over the last two years, DeepMind’s AlphaGo has pushed the limit of what is possible with machine learning in games, bringing many innovations and technological advances in order to successfully defeat some of the best players in the world [1], [2], [3]. A little more than 10 years before the success of AlphaGo, the classical tree search techniques that were so successful in Chess were reigning in computer Go programs, but only reaching weak amateur level for human Go players. Thanks to Monte-Carlo Tree Search — a (then) new type of search algorithm based on sampling possible outcomes of the game from a position, and incrementally improving the search tree from the results of those simulations — computers were able to search much deeper in the game. This is important because it made it possible to incorporate less human knowledge in the programs — a task which is very hard to do right. Indeed, any missing knowledge that a human expert either cannot express or did not think about may create errors in the computer evaluation of the game position, and lead to blunders*. In 2007, Sylvain and David augmented the Monte Carlo Tree Search techniques by exploring two types of knowledge incorporation: (i) online, where the decision for the next move is taken from the current position, using compute resources at the time when the next move is needed, and (ii) offline, where the learning process happens entirely before the game starts, and is summarized into a model that can be applied to all possible positions of a game (even though not all possible positions have been seen during the learning process). This ultimately led to the computer program MoGo, which showed an improvement in performance over previous Go algorithms. allowfullscreen="" class="YOUTUBE-iframe-video" data-thumbnail-src="https://i.ytimg.com/vi/Bm7zah_LrmE/0.jpg" frameborder="0" height="360" src="https://www.youtube.com/embed/Bm7zah_LrmE?rel=0&feature=player_embedded" width="640">For the online part, they adapted the simple idea that some actions don’t necessarily depend on each other. For example, if you need to book a vacation, the choice of the hotel, flight and car rental is obviously dependent on the choice of your destination. However, once given a destination, these things can be chosen (mostly) independently of each other. The same idea can be applied to Go, where some moves can be estimated partially independently of each other to get a very quick, albeit imprecise, estimate. Of course, when time is available, the exact dependencies are also analysed.For offline knowledge incorporation, they explored the impact of learning an approximation of the position value with the computer playin[...]



Transformer: A Novel Neural Network Architecture for Language Understanding

2017-09-01T09:37:11.137-07:00

Posted by Jakob Uszkoreit, Software Engineer, Natural Language UnderstandingNeural networks, in particular recurrent neural networks (RNNs), are now at the core of the leading approaches to language understanding tasks such as language modeling, machine translation and question answering. In Attention Is All You Need we introduce the Transformer, a novel neural network architecture based on a self-attention mechanism that we believe to be particularly well-suited for language understanding. In our paper, we show that the Transformer outperforms both recurrent and convolutional models on academic English to German and English to French translation benchmarks. On top of higher translation quality, the Transformer requires less computation to train and is a much better fit for modern machine learning hardware, speeding up training by up to an order of magnitude.BLEU scores (higher is better) of single models on the standard WMT newstest2014 English to German translation benchmark.BLEU scores (higher is better) of single models on the standard WMT newstest2014 English to French translation benchmark.Accuracy and Efficiency in Language UnderstandingNeural networks usually process language by generating fixed- or variable-length vector-space representations. After starting with representations of individual words or even pieces of words, they aggregate information from surrounding words to determine the meaning of a given bit of language in context. For example, deciding on the most likely meaning and appropriate representation of the word “bank” in the sentence “I arrived at the bank after crossing the…” requires knowing if the sentence ends in “... road.” or “... river.”RNNs have in recent years become the typical network architecture for translation, processing language sequentially in a left-to-right or right-to-left fashion. Reading one word at a time, this forces RNNs to perform multiple steps to make decisions that depend on words far away from each other. Processing the example above, an RNN could only determine that “bank” is likely to refer to the bank of a river after reading each word between “bank” and “river” step by step. Prior research has shown that, roughly speaking, the more such steps decisions require, the harder it is for a recurrent network to learn how to make those decisions.The sequential nature of RNNs also makes it more difficult to fully take advantage of modern fast computing devices such as TPUs and GPUs, which excel at parallel and not sequential processing. Convolutional neural networks (CNNs) are much less sequential than RNNs, but in CNN architectures like ByteNet or ConvS2S the number of steps required to combine information from distant parts of the input still grows with increasing distance.The TransformerIn contrast, the Transformer only performs a small, constant number of steps (chosen empirically). In each step, it applies a self-attention mechanism which directly models relationships between all words in a sentence, regardless of their respective position. In the earlier example “I arrived at the bank after crossing the river”, to determine that the word “bank” refers to the shore of a river and not a financial institution, the Transformer can learn to immediately attend to the word “river” and make this decision in a single step. In fact, in our English-French translation model we observe exactly this behavior.More specifically, to compute the next representation for a given word - “bank” for example - the Transformer compares it to every other word in the sentence. The result of these comparisons is an attention score for every other word in the sentence. These attention scores determine how much each of the other words should contribute to the next representation of “bank”. In the example, the disambiguating “river” could receive a high attention score when compu[...]



Exploring and Visualizing an Open Global Dataset

2017-08-25T10:00:30.473-07:00

Posted by Reena Jana, Creative Lead, Business Inclusion, and Josh Lovejoy, UX Designer, Google ResearchMachine learning systems are increasingly influencing many aspects of everyday life, and are used by both the hardware and software products that serve people globally. As such, researchers and designers seeking to create products that are useful and accessible for everyone often face the challenge of finding data sets that reflect the variety and backgrounds of users around the world. In order to train these machine learning systems, open, global — and growing — datasets are needed.Over the last six months, we’ve seen such a dataset emerge from users of Quick, Draw!, Google’s latest approach to helping wide, international audiences understand how neural networks work. A group of Googlers designed Quick, Draw! as a way for anyone to interact with a machine learning system in a fun way, drawing everyday objects like trees and mugs. The system will try to guess what their drawing depicts, within 20 seconds. While the goal of Quick, Draw! was simply to create a fun game that runs on machine learning, it has resulted in 800 million drawings from twenty million people in 100 nations, from Brazil to Japan to the U.S. to South Africa.And now we are releasing an open dataset based on these drawings so that people around the world can contribute to, analyze, and inform product design with this data. The dataset currently includes 50 million drawings Quick Draw! players have generated (we will continue to release more of the 800 million drawings over time).It’s a considerable amount of data; and it’s also a fascinating lens into how to engage a wide variety of people to participate in (1) training machine learning systems, no matter what their technical background; and (2) the creation of open data sets that reflect a wide spectrum of cultures and points of view.Seeing national — and global — patterns in one glanceTo understand visual patterns within the dataset quickly and efficiently, we worked with artist Kyle McDonald to overlay thousands of drawings from around the world. This helped us create composite images and identify trends in each nation, as well across all nations. We made animations of 1000 layered international drawings of cats and chairs, below, to share how we searched for visual trends with this data:Cats, made from 1000 drawings from around the world:Chairs, made from 1,000 drawings around the world:Doodles of naturally recurring objects, like cats (or trees, rainbows, or skulls) often look alike across cultures:However, for objects that might be familiar to some cultures, but not others, we saw notable differences. Sandwiches took defined forms or were a jumbled set of lines; mug handles pointed in opposite directions; and chairs were drawn facing forward or sideways, depending on the nation or region of the world:One size doesn’t fit allThese composite drawings, we realized, could reveal how perspectives and preferences differ between audiences from different regions, from the type of bread used in sandwiches to the shape of a coffee cup, to the aesthetic of how to depict objects so they are visually appealing. For example, a more straightforward, head-on view was more consistent in some nations; side angles in others.Overlaying the images also revealed how to improve how we train neural networks when we lack a variety of data — even within a large, open, and international data set. For example, when we analyzed 115,000+ drawings of shoes in the Quick, Draw! dataset, we discovered that a single style of shoe, which resembles a sneaker, was overwhelmingly represented. Because it was so frequently drawn, the neural network learned to recognize only this style as a “shoe.”But just as in the physical world, in the realm of training data, one size does not fit all. We asked, how can we consistently [...]



Launching the Speech Commands Dataset

2017-08-24T10:06:05.326-07:00

Posted by Pete Warden, Software Engineer, Google Brain TeamAt Google, we’re often asked how to get started using deep learning for speech and other audio recognition problems, like detecting keywords or commands. And while there are some great open source speech recognition systems like Kaldi that can use neural networks as a component, their sophistication makes them tough to use as a guide to a simpler tasks. Perhaps more importantly, there aren’t many free and openly available datasets ready to be used for a beginner’s tutorial (many require preprocessing before a neural network model can be built on them) or that are well suited for simple keyword detection.To solve these problems, the TensorFlow and AIY teams have created the Speech Commands Dataset, and used it to add training* and inference sample code to TensorFlow. The dataset has 65,000 one-second long utterances of 30 short words, by thousands of different people, contributed by members of the public through the AIY website. It’s released under a Creative Commons BY 4.0 license, and will continue to grow in future releases as more contributions are received. The dataset is designed to let you build basic but useful voice interfaces for applications, with common words like “Yes”, “No”, digits, and directions included. The infrastructure we used to create the data has been open sourced too, and we hope to see it used by the wider community to create their own versions, especially to cover underserved languages and applications.To try it out for yourself, download the prebuilt set of the TensorFlow Android demo applications and open up “TF Speech”. You’ll be asked for permission to access your microphone, and then see a list of ten words, each of which should light up as you say them.The results will depend on whether your speech patterns are covered by the dataset, so it may not be perfect — commercial speech recognition systems are a lot more complex than this teaching example. But we’re hoping that as more accents and variations are added to the dataset, and as the community contributes improved models to TensorFlow, we’ll continue to see improvements and extensions.You can also learn how to train your own version of this model through the new audio recognition tutorial on TensorFlow.org. With the latest development version of the framework and a modern desktop machine, you can download the dataset and train the model in just a few hours. You’ll also see a wide variety of options to customize the neural network for different problems, and to make different latency, size, and accuracy tradeoffs to run on different platforms.We are excited to see what new applications people are able to build with the help of this dataset and tutorial, so I hope you get a chance to dive in and start recognizing!* The architecture this network is based on is described in Convolutional Neural Networks for Small-footprint Keyword Spotting, presented at Interspeech 2015.↩ [...]



Google at KDD’17: Graph Mining and Beyond

2017-09-12T09:29:03.923-07:00

Posted by Bryan Perozzi, Research Scientist, NYC Algorithms and Optimization TeamThe 23rd ACM conference on Knowledge Discovery and Data Mining (KDD’17), a main venue for academic and industry research in data science, information retrieval, data mining and machine learning, was held last week in Halifax, Canada. Google has historically been an active participant in KDD, and this year was no exception, with Googlers’ contributing numerous papers and participating in workshops. In addition to our overall participation, we are happy to congratulate fellow Googler Bryan Perozzi for receiving the SIGKDD 2017 Doctoral Dissertation Award, which serves to recognize excellent research by doctoral candidates in the field of data mining and knowledge discovery. This award was given in recognition of his thesis on the topic of machine learning on graphs performed at Stony Brook University, under the advisorship of Steven Skiena. Part of his thesis was developed during his internships at Google. The thesis dealt with using a restricted set of local graph primitives (such as ego-networks and truncated random walks) to effectively exploit the information around each vertex for classification, clustering, and anomaly detection. Most notably, the work introduced the random-walk paradigm for graph embedding with neural networks in DeepWalk.DeepWalk: Online Learning of Social Representations, originally presented at KDD'14, outlines a method for using a series of local information obtained from truncated random walks to learn latent representations of nodes in a graph (e.g. users in a social network). The core idea was to treat each segment of a random walk as a sentence “in the language of the graph.” These segments could then be used as input for neural network models to learn representations of the graph’s nodes, using sequence modeling methods like word2vec (which had just been developed at the time). This research continues at Google, most recently with Learning Edge Representations via Low-Rank Asymmetric Projections.The full list of Google contributions at KDD’17 is listed below (Googlers highlighted in blue).Organizing CommitteePanel Chair: Andrew Tomkins Research Track Program Chair: Ravi Kumar Applied Data Science Track Program Chair: Roberto J. Bayardo Research Track Program Committee: Sergei Vassilvitskii, Alex Beutel, Abhimanyu Das, Nan Du, Alessandro Epasto, Alex Fabrikant, Silvio Lattanzi, Kristen Lefevre, Bryan Perozzi, Karthik Raman, Steffen Rendle, Xiao YuApplied Data Science Program Track Committee: Edith Cohen, Ariel Fuxman, D. Sculley, Isabelle Stanton, Martin Zinkevich, Amr Ahmed, Azin Ashkan, Michael Bendersky, James Cook, Nan Du, Balaji Gopalan, Samuel Huston, Konstantinos Kollias, James Kunz, Liang Tang, Morteza ZadimoghaddamAwardsDoctoral Dissertation Award: Bryan Perozzi, for Local Modeling of Attributed Graphs: Algorithms and Applications.Doctoral Dissertation Runner-up Award: Alex Beutel, for User Behavior Modeling with Large-Scale Graph Analysis.PapersEgo-Splitting Framework: from Non-Overlapping to Overlapping ClustersAlessandro Epasto, Silvio Lattanzi, Renato Paes LemeHyperLogLog Hyperextended: Sketches for Concave Sublinear Frequency StatisticsEdith CohenGoogle Vizier: A Service for Black-Box OptimizationDaniel Golovin, Benjamin Solnik, Subhodeep Moitra, Greg Kochanski, John Karro, D. SculleyQuick Access: Building a Smart Experience for Google DriveSandeep Tata, Alexandrin Popescul, Marc Najork, Mike Colagrosso, Julian Gibbons, Alan Green, Alexandre Mah, Michael Smith, Divanshu Garg, Cayden Meyer, Reuben KanPapersTFX: A TensorFlow­ Based[...]



Announcing the NYC Algorithms and Optimization Site

2017-08-21T10:11:26.652-07:00

Posted by Vahab Mirrokni, Principal Research Scientist and Xerxes Dotiwalla, Product Manager, NYC Algorithms and Optimization TeamNew York City is home to several Google algorithms research groups. We collaborate closely with the teams behind many Google products and work on a wide variety of algorithmic challenges, like optimizing infrastructure, protecting privacy, improving friend suggestions and much more.Today, we’re excited to provide more insights into the research done in the Big Apple with the launch of the NYC Algorithms and Optimization Team page. The NYC Algorithms and Optimization Team comprises multiple overlapping research groups working on large-scale graph mining, large-scale optimization and market algorithms. Large-scale Graph MiningThe Large-scale Graph Mining Group is tasked with building the most scalable library for graph algorithms and analysis and applying it to a multitude of Google products. We formalize data mining and machine learning challenges as graph algorithms problems and perform fundamental research in those fields leading to publications in top venues.Our projects include:Large-scale Similarity Ranking: Our research in pairwise similarity ranking has produced a number of innovative methods, which we have published in top venues such as WWW, ICML, and VLDB, e.g., improving friend suggestion using ego-networks and computing similarity rankings in large-scale multi-categorical bipartite graphs.Balanced Partitioning: Balanced partitioning is often a crucial first step in solving large-scale graph optimization problems. As our paper shows, we are able to achieve a 15-25% reduction in cut size compared to state-of-the-art algorithms in the literature.Clustering and Connected Components: We have state-of-the-art implementations of many different algorithms including hierarchical clustering, overlapping clustering, local clustering, spectral clustering, and connected components. Our methods are 10-30x faster than the best previously studied algorithms and can scale to graphs with trillions of edges.Public-private Graph Computation: Our research on novel models of graph computation based on a personal view of private data preserves the privacy of each user.Large-scale OptimizationThe Large-scale Optimization Group’s mission is to develop large-scale optimization techniques and use them to improve the efficiency and robustness of infrastructure at Google. We apply techniques from areas such as combinatorial optimization, online algorithms, and control theory to make Google’s massive computational infrastructure do more with less. We combine online and offline optimizations to achieve such goals as increasing throughput, decreasing latency, minimizing resource contention, maximizing the efficacy of caches, and eliminating unnecessary work in distributed systems. Our research is used in critical infrastructure that supports core products:Consistent Hashing: We designed memoryless balanced allocation algorithms to assign a dynamic set of clients to a dynamic set of servers such that the load on each server is bounded, and the allocation does not change by much for every update operation. This technique is currently implemented in Google Cloud Pub/Sub and externally in the open-source haproxy.Distributed Optimization Based on Core-sets: Composable core-sets provide an effective method for solving optimization problems on massive datasets. This technique can be used for several problems including distributed balanced clustering and distributed submodular maximization.Google Search Infrastructure Optimization: We partnered with the Google Search infrastructure team to build a distributed feedback control loop to govern the way queries are fanned out to machines. We also improved the efficacy of caching by increasing the homog[...]



Making Visible Watermarks More Effective

2017-08-18T11:00:07.131-07:00

Posted by Tali Dekel and Michael Rubinstein, Research ScientistsWhether you are a photographer, a marketing manager, or a regular Internet user, chances are you have encountered visible watermarks many times. Visible watermarks are those logos and patterns that are often overlaid on digital images provided by stock photography websites, marking the image owners while allowing viewers to perceive the underlying content so that they could license the images that fit their needs. It is the most common mechanism for protecting the copyrights of hundreds of millions of photographs and stock images that are offered online daily.It’s standard practice to use watermarks on the assumption that they prevent consumers from accessing the clean images, ensuring there will be no unauthorized or unlicensed use. However, in “On The Effectiveness Of Visible Watermarks” recently presented at the 2017 Computer Vision and Pattern Recognition Conference (CVPR 2017), we show that a computer algorithm can get past this protection and remove watermarks automatically, giving users unobstructed access to the clean images the watermarks are intended to protect. Left: example watermarked images from popular stock photography websites. Right: watermark-free version of the images on the left, produced automatically by a computer algorithm. More results are available below and on our project page. Image sources: Adobe Stock, 123RF.As often done with vulnerabilities discovered in operating systems, applications or protocols, we want to disclose this vulnerability and propose solutions in order to help the photography and stock image communities adapt and better protect its copyrighted content and creations. From our experiments much of the world’s stock imagery is currently susceptible to this circumvention. As such, in our paper we also propose ways to make visible watermarks more robust to such manipulations. allowfullscreen="" class="YOUTUBE-iframe-video" data-thumbnail-src="https://i.ytimg.com/vi/02Ywt87OpS4/0.jpg" frameborder="0" height="360" src="https://www.youtube.com/embed/02Ywt87OpS4?rel=0&feature=player_embedded" width="640">The Vulnerability of Visible WatermarksVisible watermarks are often designed to contain complex structures such as thin lines and shadows in order to make them harder to remove. Indeed, given a single image, for a computer to detect automatically which visual structures belong to the watermark and which structures belong to the underlying image is extremely difficult. Manually, the task of removing a watermark from an image is tedious, and even with state-of-the-art editing tools it may take a Photoshop expert several minutes to remove a watermark from one image.However, a fact that has been overlooked so far is that watermarks are typically added in a consistent manner to many images. We show that this consistency can be used to invert the watermarking process — that is, estimate the watermark image and its opacity, and recover the original, watermark-free image underneath. This can all be done automatically, without any user intervention or prior information about the watermark, and by only observing watermarked image collections publicly available online.The consistency of a watermark over many images allows to automatically remove it in mass scale. Left: input collection marked by the same watermark, middle: computed watermark and its opacity, right: recovered, watermark-free images. Image sources: COCO dataset, Copyright logo.The first step of this process is identifying which image structures are repeating in the collection. If a similar watermark is embedded in many images, the watermark becomes the signal in the collection and the images become the noise, and simple image operations can be used to pull o[...]



Harness the Power of Machine Learning in Your Browser with Deeplearn

2017-09-07T17:23:57.228-07:00

Posted by Nikhil Thorat and Daniel Smilkov, Software Engineers, Google Big Picture TeamMachine learning (ML) has become an increasingly powerful tool, one that can be applied to a wide variety of areas spanning object recognition, language translation, health and more. However, the development of ML systems is often restricted to those with computational resources and the technical expertise to work with commonly available ML libraries. With PAIR — an initiative to study and redesign human interactions with ML — we want to open machine learning up to as many people as possible. In pursuit of that goal, we are excited to announce deeplearn 0.1.0, an open source WebGL-accelerated JavaScript library for machine learning that runs entirely in your browser, with no installations and no backend. There are many reasons to bring machine learning into the browser. A client-side ML library can be a platform for interactive explanations, for rapid prototyping and visualization, and even for offline computation. And if nothing else, the browser is one of the world's most popular programming platforms.While web machine learning libraries have existed for years (e.g., Andrej Karpathy's convnetjs) they have been limited by the speed of Javascript, or have been restricted to inference rather than training (e.g., TensorFire). By contrast, deeplearn offers a significant speedup by exploiting WebGL to perform computations on the GPU, along with the ability to do full backpropagation.The API mimics the structure of TensorFlow and NumPy, with a delayed execution model for training (like TensorFlow), and an immediate execution model for inference (like NumPy). We have also implemented versions of some of the most commonly-used TensorFlow operations. With the release of deeplearn, we will be providing tools to export weights from TensorFlow checkpoints, which will allow authors to import them into web pages for deeplearn inference.You can explore the potential of this library by training a convolutional neural network to recognize photos and handwritten digits — all in your browser without writing a single line of code.We're releasing a series of demos that show deeplearn in action. Play with an image classifier that uses your webcam in real-time and watch the network’s internal representations of what it sees. Or generate abstract art videos at a smooth 60 frames per second. The deeplearn homepage contains these and other demos.Our vision is that this library will significantly increase visibility and engagement with machine learning, giving developers access to powerful tools while simultaneously providing the everyday user with a way to interact with them. We’re looking forward to collaborating with the open source community to drive this vision forward. [...]



Google at ICML 2017

2017-08-10T01:18:11.343-07:00

Posted by Christian Howard, Editor-in-Chief, Research CommunicationsMachine learning (ML) is a key strategic focus at Google, with highly active groups pursuing research in virtually all aspects of the field, including deep learning and more classical algorithms, exploring theory as well as application. We utilize scalable tools and architectures to build machine learning systems that enable us to solve deep scientific and engineering challenges in areas of language, speech, translation, music, visual processing and more.As a leader in ML research, Google is proud to be a Platinum Sponsor of the thirty-fourth International Conference on Machine Learning (ICML 2017), a premier annual event supported by the International Machine Learning Society taking place this week in Sydney, Australia. With over 130 Googlers attending the conference to present publications and host workshops, we look forward to our continued colalboration with the larger ML research community.If you're attending ICML 2017, we hope you'll visit the Google booth and talk with our researchers to learn more about the exciting work, creativity and fun that goes into solving some of the field's most interesting challenges. Our researchers will also be available to talk about and demo several recent efforts, including the technology behind Facets, neural audio synthesis with Nsynth, a Q&A session on the Google Brain Residency program and much more. You can also learn more about our research being presented at ICML 2017 in the list below (Googlers highlighted in blue).ICML 2017 CommitteesSenior Program Committee includes: Alex Kulesza, Amr Ahmed, Andrew Dai, Corinna Cortes, George Dahl, Hugo Larochelle, Matthew Hoffman, Maya Gupta, Moritz Hardt, Quoc LeSponsorship Co-Chair: Ryan AdamsPublicationsRobust Adversarial Reinforcement LearningLerrel Pinto, James Davidson, Rahul Sukthankar, Abhinav GuptaTight Bounds for Approximate Carathéodory and BeyondVahab Mirrokni, Renato Leme, Adrian Vladu, Sam WongSharp Minima Can Generalize For Deep NetsLaurent Dinh, Razvan Pascanu, Samy Bengio, Yoshua BengioGeometry of Neural Network Loss Surfaces via Random Matrix TheoryJeffrey Pennington, Yasaman BahriConditional Image Synthesis with Auxiliary Classifier GANsAugustus Odena, Christopher Olah, Jon ShlensLearning Deep Latent Gaussian Models with Markov Chain Monte CarloMatthew D. HoffmanOn the Expressive Power of Deep Neural NetworksMaithra Raghu, Ben Poole, Surya Ganguli, Jon Kleinberg, Jascha Sohl-DicksteinAdaNet: Adaptive Structural Learning of Artificial Neural NetworksCorinna Cortes, Xavi Gonzalvo, Vitaly Kuznetsov, Mehryar Mohri, Scott YangLearned Optimizers that Scale and GeneralizeOlga Wichrowska, Niru Maheswaranathan, Matthew Hoffman, Sergio Gomez, Misha Denil, Nando de Freitas, Jascha Sohl-DicksteinAdaptive Feature Selection: Computationally Efficient Online Sparse Linear Regression under RIPSatyen Kale, Zohar Karnin, Tengyuan Liang, David PalAlgorithms for ℓp Low-Rank ApproximationFlavio Chierichetti, Sreenivas Gollapudi, Ravi Kumar, Silvio Lattanzi, Rina Panigrahy, David WoodruffConsistent k-ClusteringSilvio Lattanzi, Sergei VassilvitskiiInput Switched Affine Networks: An RNN Architecture Designed for InterpretabilityJakob Foerster, Justin Gilmer, Jan Chorowski, Jascha Sohl-Dickstein, David SussilloOnline and Linear-Time Attention by Enforcing Monotonic AlignmentsColin Raffel, Thang Luong, Peter Liu, Ron Weiss, Douglas EckGradient Boosted Decision Trees for High Dimensional Sparse OutputSi Si, Huan Zhang, Sathiya Keerthi, Dhruv Mahajan, Inderjit Dhillon, Cho-Jui Hsieh[...]



Google at ACL 2017

2017-08-01T09:52:41.027-07:00

Posted by Christian Howard, Editor-in-Chief, Research CommunicationsThis week, Vancouver, Canada hosts the 2017 Annual Meeting of the Association for Computational Linguistics (ACL 2017), the premier conference in the field of natural language understanding, covering a broad spectrum of diverse research areas that are concerned with computational approaches to natural language. As a leader in natural language processing & understanding and a Platinum sponsor of ACL 2017, Google will be on hand to showcase research interests that include syntax, semantics, discourse, conversation, multilingual modeling, sentiment analysis, question answering, summarization, and generally building better systems using labeled and unlabeled data, state-of-the-art modeling and learning from indirect supervision. If you’re attending ACL 2017, we hope that you’ll stop by the Google booth to check out some demos, meet our researchers and discuss projects and opportunities at Google that go into solving interesting problems for billions of people. Learn more about the Google research being presented at ACL 2017 below (Googlers highlighted in blue).Organizing CommitteeArea Chairs include: Sujith Ravi (Machine Learning), Thang Luong (Machine Translation) Publication Chairs include: Margaret Mitchell (Advisory)Accepted PapersA Polynomial-Time Dynamic Programming Algorithm for Phrase-Based Decoding with a Fixed Distortion Limit Yin-Wen Chang, Michael Collins Cross-Sentence N-ary Relation Extraction with Graph LSTMs Nanyun Peng, Hoifung Poon, Chris Quirk, Kristina Toutanova, Wen-Tau YihNeural Symbolic Machines: Learning Semantic Parsers on Freebase with Weak SupervisionChen Liang, Jonathan Berant, Quoc Le, Kenneth D. Forbus, Ni LaoCoarse-to-Fine Question Answering for Long DocumentsEunsol Choi, Daniel Hewlett, Jakob Uszkoreit, Illia Polosukhin, Alexandre Lacoste, Jonathan BerantAutomatic Compositor Attribution in the First Folio of ShakespeareMaria Ryskina, Hannah Alpert-Abrams, Dan Garrette, Taylor Berg-KirkpatrickA Nested Attention Neural Hybrid Model for Grammatical Error CorrectionJianshu Ji, Qinlong Wang, Kristina Toutanova, Yongen Gong, Steven Truong, Jianfeng GaoGet To The Point: Summarization with Pointer-Generator NetworksAbigail See, Peter J. Liu, Christopher D. ManningIdentifying 1950s American Jazz Composers: Fine-Grained IsA Extraction via Modifier CompositionEllie Pavlick*, Marius PascaLearning to Skim TextAdams Wei Yu, Hongrae Lee, Quoc LeWorkshops2017 ACL Student Research WorkshopProgram Committee includes: Emily Pitler, Brian Roark, Richard SproatWiNLP: Women and Underrepresented Minorities in Natural Language ProcessingOrganizers include: Margaret MitchellGold SponsorBUCC: 10th Workshop on Building and Using Comparable CorporaScientific Committee includes: Richard SproatCLPsych: Computational Linguistics and Clinical Psychology – From Linguistic Signal to ClinicalRealityProgram Committee includes: Brian Roark, Richard SproatRepl4NLP: 2nd Workshop on Representation Learning for NLPProgram Committee includes: Ankur Parikh, John PlattRoboNLP: Language Grounding for RoboticsProgram Committee includes: Ankur Parikh, Tom KwiatkowskiCoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal DependenciesManagement Group includes: Slav PetrovCoNLL-SIGMORPHON-2017 Shared Task: Universal Morphological ReinflectionOrganizing Committee includes: Manaal FaruquiInvited Speaker: Chris DyerSemEval: 11th International Workshop on Semantic EvaluationOrganizers include: Daniel CerALW1: 1st Workshop on Abusive Language OnlinePanelists include: Margaret MitchellEventStory: Events and Stories in the NewsProgram Committee includes: Silvia ParetiNMT: 1st Wo[...]



Expressions in Virtual Reality

2017-08-22T16:35:56.699-07:00

Posted by Steven Hickson, Software Engineering Intern, and Nick Dufour, Avneesh Sud, Software Engineers, Machine PerceptionRecently Google Machine Perception researchers, in collaboration with Daydream Labs and YouTube Spaces, presented a solution for virtual headset ‘removal’ for mixed reality in order to create a more rich and engaging VR experience. While that work could infer eye-gaze directions and blinks, enabled by a headset modified with eye-tracking technology, a richer set of facial expressions — which are key to understanding a person's experience in VR, as well as conveying important social engagement cues — were missing.Today we present an approach to infer select facial action units and expressions entirely by analyzing a small part of the face while the user is engaged in a virtual reality experience. Specifically, we show that images of the user’s eyes captured from an infrared (IR) gaze-tracking camera within a VR headset are sufficient to infer at least a subset of facial expressions without the use of any external cameras or additional sensors.Left: A user wearing a VR HMD modified with eye-tracking used for expression classification (Note that no external camera is used in our method; this is just for visualization). Right: inferred expression from eye images using our model. A video demonstrating the work can be seen here.We use supervised deep learning to classify facial expressions from images of the eyes and surrounding areas, which typically contain the iris, sclera, eyelids and may include parts of the eyebrows and top of cheeks. Obtaining large scale annotated data from such novel sensors is a challenging task, hence we collected training data by capturing 46 subjects while performing a set of facial expressions.To perform expression classification, we fine-tuned a variant of the widespread Inception architecture with TensorFlow using weights from a model trained to convergence on Imagenet. We attempted to partially remove variance due to differences in participant appearance (i.e., individual differences that do not depend on expression), inspired by the standard practice of mean image subtraction. Since this variance removal occurs within-subject, it is effectively personalization. Further details, along with examples of eye-images, and results are presented in our accompanying paper. Results and ExtensionsWe demonstrate that the information required to classify a variety of facial expressions is reliably present in IR eye images captured by a commercial HMD sensor, and that this information can be decoded using a CNN-based method, even though classifying facial expressions from eye-images alone is non-trivial even for humans. Our model inference can be performed in real-time, and we show this can be used to generate expressive avatars in real-time, which can function as an expressive surrogate for users engaged in VR. This interaction mechanism also yields a more intuitive interface for sharing expression in VR as opposed to gestures or keyboard inputs.The ability to capture a user’s facial expressions using existing eye-tracking cameras enables a fully mobile solution to facial performance capture in VR, without additional external cameras. This technology extends beyond animating cartoon avatars; it could be used to provide a richer headset removal experience, enhancing communication and social interaction in VR by transmitting far more authentic and emotionally textured information. AcknowledgementsThe research described in this post was performed by Steven Hickson (as an intern), Nick Dufour, Avneesh Sud, Vivek Kwatra and Irfan Essa. We also thank Hayes Raffle and Alex Wong[...]



So there I was, firing a megawatt plasma collider at work...

2017-07-25T02:14:42.802-07:00

Posted by Ted Baltz, Senior Staff Software Engineer, Google Accelerated Science TeamWait, what? Why is Google interested in plasma physics?Google is always interested in solving complex engineering problems, and few are more complex than fusion. Physicists have been trying since the 1950s to control the fusion of hydrogen atoms into helium, which is the same process that powers the Sun. The key to harnessing this power is to confine hydrogen plasmas for long enough to get more energy out from fusion reactions than was put in. This point is called “breakeven.” If it works, it would represent a technological breakthrough, and could provide an abundant source of zero-carbon energy.There are currently several large academic and government research efforts in fusion. Just to rattle off a few, in plasma fusion there are tokamak machines like ITER and stellarator machines like Wendelstein 7-X. The stellarator design actually goes back to 1951, so physicists have been working on this for a while. Oh yeah, and if you like giant lasers, there’s the National Ignition Facility which users lasers to generate X-rays to generate fusion reactions. So far, none of these has gotten to the economic breakeven point.All of these efforts involve complex experiments with many variables, providing an opportunity for Google to help, with our strength in computing and machine learning. Today, we’re publishing “Achievement of Sustained Net Plasma Heating in a Fusion Experiment with the Optometrist Algorithm” in Scientific Reports. This paper describes the first results of Google’s collaboration with the physicists and engineers at Tri Alpha Energy, taking a step towards the breakeven goal.Did you really just say that you got to fire a plasma collider?Yeah. Tri Alpha Energy has a unique scheme for plasma confinement called a field-reversed configuration that’s predicted to get more stable as the energy goes up, in contrast to other methods where plasmas get harder to control as you heat them. Tri Alpha built a giant ionized plasma machine, C-2U, that fills an entire warehouse in an otherwise unassuming office park. The plasma that this machine generates and confines exhibits all kinds of highly nonlinear behavior. The machine itself pushes the envelope of how much electrical power can be applied to generate and confine the plasma in such a small space over such a short time. It’s a complex machine with more than 1000 knobs and switches, an investment (not ours!) in exploring clean energy north of $100 million. This is a high-stakes optimization problem, dealing with both plasma performance and equipment constraints. This is where Google comes in.End-on view of C-2UWait, why not just simulate what will happen? Isn’t this simple physics?The “simple” simulations using magnetohydrodynamics don’t really apply. Even if these machines operated in that limit, which they very much don’t, the simulations make fluid dynamics simulations look easy! The reality is much more complicated, as the ion temperature is three times larger than the electron temperature, so the plasma is far out of thermal equilibrium, also, the fluid approximation is totally invalid, so you have to track at least some of the trillion+ individual particles, so the whole thing is beyond what we know how to do even with Google-scale compute resources.So why are we doing this? Real experiments! With atoms not bits! At Google we love to run experiments and optimize things. We thought it would be a great challenge to see if we could help Tri Alpha. They run a plasma “shot” on the C-2U machine every 8 minutes. Each shot consists of[...]



Teaching Robots to Understand Semantic Concepts

2017-08-16T12:28:20.197-07:00

Posted by Sergey Levine, Faculty Advisor and Pierre Sermanet, Research Scientist, Google Brain TeamMachine learning can allow robots to acquire complex skills, such as grasping and opening doors. However, learning these skills requires us to manually program reward functions that the robots then attempt to optimize. In contrast, people can understand the goal of a task just from watching someone else do it, or simply by being told what the goal is. We can do this because we draw on our own prior knowledge about the world: when we see someone cut an apple, we understand that the goal is to produce two slices, regardless of what type of apple it is, or what kind of tool is used to cut it. Similarly, if we are told to pick up the apple, we understand which object we are to grab because we can ground the word “apple” in the environment: we know what it means. These are semantic concepts: salient events like producing two slices, and object categories denoted by words such as “apple.” Can we teach robots to understand semantic concepts, to get them to follow simple commands specified through categorical labels or user-provided examples? In this post, we discuss some of our recent work on robotic learning that combines experience that is autonomously gathered by the robot, which is plentiful but lacks human-provided labels, with human-labeled data that allows a robot to understand semantics. We will describe how robots can use their experience to understand the salient events in a human-provided demonstration, mimic human movements despite the differences between human robot bodies, and understand semantic categories, like “toy” and “pen”, to pick up objects based on user commands.Understanding human demonstrations with deep visual featuresIn the first set of experiments, which appear in our paper Unsupervised Perceptual Rewards for Imitation Learning, our is aim is to enable a robot to understand a task, such as opening a door, from seeing only a small number of unlabeled human demonstrations. By analyzing these demonstrations, the robot must understand what is the semantically salient event that constitutes task success, and then use reinforcement learning to perform it.Examples of human demonstrations (left) and the corresponding robotic imitation (right).Unsupervised learning on very small datasets is one of the most challenging scenarios in machine learning. To make this feasible, we use deep visual features from a large network trained for image recognition on ImageNet. Such features are known to be sensitive to semantic concepts, while maintaining invariance to nuisance variables such as appearance and lighting. We use these features to interpret user-provided demonstrations, and show that it is indeed possible to learn reward functions in an unsupervised fashion from a few demonstrations and without retraining.Example of reward functions learned solely from observation for the door opening tasks. Rewards progressively increase from zero to the maximum reward as a task is completed.After learning a reward function from observation only, we use it to guide a robot to learn a door opening task, using only the images to evaluate the reward function. With the help of an initial kinesthetic demonstration that succeeds about 10% of the time, the robot learns to improve to 100% accuracy using the learned reward function.Learning progression.Emulating human movements with self-supervision and imitation.In Time-Contrastive Networks: Self-Supervised Learning from Multi-View Observation, we propose a novel approach to learn about the world from observation an[...]



Google at CVPR 2017

2017-07-21T08:00:06.778-07:00

Posted by Christian Howard, Editor-in-Chief, Research CommunicationsFrom July 21-26, Honolulu, Hawaii hosts the 2017 Conference on Computer Vision and Pattern Recognition (CVPR 2017), the premier annual computer vision event comprising the main conference and several co-located workshops and tutorials. As a leader in computer vision research and a Platinum Sponsor, Google will have a strong presence at CVPR 2017 — over 250 Googlers will be in attendance to present papers and invited talks at the conference, and to organize and participate in multiple workshops.If you are attending CVPR this year, please stop by our booth and chat with our researchers who are actively pursuing the next generation of intelligent systems that utilize the latest machine learning techniques applied to various areas of machine perception. Our researchers will also be available to talk about and demo several recent efforts, including the technology behind Headset Removal for Virtual and Mixed Reality, Image Compression with Neural Networks, Jump, TensorFlow Object Detection API and much more.You can learn more about our research being presented at CVPR 2017 in the list below (Googlers highlighted in blue).Organizing CommitteeCorporate Relations Chair - Mei HanArea Chairs include - Alexander Toshev, Ce Liu, Vittorio Ferrari, David LowePapersTraining object class detectors with click supervisionDim Papadopoulos, Jasper Uijlings, Frank Keller, Vittorio FerrariUnsupervised Pixel-Level Domain Adaptation With Generative Adversarial NetworksKonstantinos Bousmalis, Nathan Silberman, David Dohan, Dumitru Erhan, Dilip KrishnanBranchOut: Regularization for Online Ensemble Tracking With Convolutional Neural Networks Bohyung Han, Jack Sim, Hartwig Adam Enhancing Video Summarization via Vision-Language EmbeddingBryan A. Plummer, Matthew Brown, Svetlana LazebnikLearning by Association — A Versatile Semi-Supervised Training Method for Neural Networks Philip Haeusser, Alexander Mordvintsev, Daniel CremersContext-Aware Captions From Context-Agnostic SupervisionRamakrishna Vedantam, Samy Bengio, Kevin Murphy, Devi Parikh, Gal Chechik Spatially Adaptive Computation Time for Residual NetworksMichael Figurnov, Maxwell D. Collins, Yukun Zhu, Li Zhang, Jonathan Huang, Dmitry Vetrov, Ruslan SalakhutdinovXception: Deep Learning With Depthwise Separable ConvolutionsFrançois Chollet Deep Metric Learning via Facility LocationHyun Oh Song, Stefanie Jegelka, Vivek Rathod, Kevin MurphySpeed/Accuracy Trade-Offs for Modern Convolutional Object DetectorsJonathan Huang, Vivek Rathod, Chen Sun, Menglong Zhu, Anoop Korattikara, Alireza Fathi, Ian Fischer, Zbigniew Wojna, Yang Song, Sergio Guadarrama, Kevin MurphySynthesizing Normalized Faces From Facial Identity FeaturesForrester Cole, David Belanger, Dilip Krishnan, Aaron Sarna, Inbar Mosseri, William T. FreemanTowards Accurate Multi-Person Pose Estimation in the WildGeorge Papandreou, Tyler Zhu, Nori Kanazawa, Alexander Toshev, Jonathan Tompson, Chris Bregler, Kevin MurphyGuessWhat?! Visual Object Discovery Through Multi-Modal DialogueHarm de Vries, Florian Strub, Sarath Chandar, Olivier Pietquin, Hugo Larochelle, Aaron CourvilleLearning discriminative and transformation covariant local feature detectorsXu Zhang, Felix X. Yu, Svebor Karaman, Shih-Fu ChangFull Resolution Image Compression With Recurrent Neural NetworksGeorge Toderici, Damien Vincent, Nick Johnston, Sung J[...]



An Update to Open Images - Now with Bounding-Boxes

2017-07-20T14:41:08.669-07:00

Posted by Vittorio Ferrari, Research Scientist, Machine PerceptionLast year we introduced Open Images, a collaborative release of ~9 million images annotated with labels spanning over 6000 object categories, designed to be a useful dataset for machine learning research. The initial release featured image-level labels automatically produced by a computer vision model similar to Google Cloud Vision API, for all 9M images in the training set, and a validation set of 167K images with 1.2M human-verified image-level labels.Today, we introduce an update to Open Images, which contains the addition of a total of ~2M bounding-boxes to the existing dataset, along with several million additional image-level labels. Details include:1.2M bounding-boxes around objects for 600 categories on the training set. These have been produced semi-automatically by an enhanced version of the technique outlined in [1], and are all human-verified.Complete bounding-box annotation for all object instances of the 600 categories on the validation set, all manually drawn (830K boxes). The bounding-box annotations in the training and validations sets will enable research on object detection on this dataset. The 600 categories offer a broader range than those in the ILSVRC and COCO detection challenges, and include new objects such as fedora hat and snowman.4.3M human-verified image-level labels on the training set (over all categories). This will enable large-scale experiments on object classification, based on a clean training set with reliable labels.Annotated images from the Open Images dataset. Left: FAMILY MAKING A SNOWMAN by mwvchamber. Right: STANZA STUDENTI.S.S. ANNUNZIATA by ersupalermo. Both images used under CC BY 2.0 license. See more examples here.We hope that this update to Open Images will stimulate the broader research community to experiment with object classification and detection models, and facilitate the development and evaluation of new techniques.References[1] We don't need no bounding-boxes: Training object class detectors using only human verification, Papadopoulos, Uijlings, Keller, and Ferrari, CVPR 2016 [...]



Motion Stills — Now on Android

2017-07-20T08:31:09.981-07:00

Posted by Karthik Raveendran and Suril Shah, Software Engineers, Google ResearchLast year, we launched Motion Stills, an iOS app that stabilizes your Live Photos and lets you view and share them as looping GIFs and videos. Since then, Motion Stills has been well received, being listed as one of the top apps of 2016 by The Verge and Mashable. However, from its initial release, the community has been asking us to also make Motion Stills available for Android. We listened to your feedback and today, we're excited to announce that we’re bringing this technology, and more, to devices running Android 5.1 and later!Motion Stills on Android: Instant stabilization on your device.With Motion Stills on Android we built a new recording experience where everything you capture is instantly transformed into delightful short clips that are easy to watch and share. You can capture a short Motion Still with a single tap like a photo, or condense a longer recording into a new feature we call Fast Forward. In addition to stabilizing your recordings, Motion Stills on Android comes with an improved trimming algorithm that guards against pocket shots and accidental camera shakes. All of this is done during capture on your Android device, no internet connection required!New streaming pipelineFor this release, we redesigned our existing iOS video processing pipeline to use a streaming approach that processes each frame of a video as it is being recorded. By computing intermediate motion metadata, we are able to immediately stabilize the recording while still performing loop optimization over the full sequence. All this leads to instant results after recording — no waiting required to share your new GIF.Capture using our streaming pipeline gives you instant results.In order to display your Motion Stills stream immediately, our algorithm computes and stores the necessary stabilizing transformation as a low resolution texture map. We leverage this texture to apply the stabilization transform using the GPU in real-time during playback, instead of writing a new, stabilized video that would tax your mobile hardware and battery.Fast ForwardFast Forward allows you to speed up and condense a longer recording into a short, easy to share clip. The same pipeline described above allows Fast Forward to process up to a full minute of video, right on your phone. You can even change the speed of playback (from 1x to 8x) after recording. To make this possible, we encode videos with a denser I-frame spacing to enable efficient seeking and playback. We also employ additional optimizations in the Fast Forward mode. For instance, we apply adaptive temporal downsampling in the linear solver and long-range stabilization for smooth results over the whole sequence. Fast Forward condenses your recordings into easy to share clips.Try out Motion StillsMotion Stills is an app for us to experiment and iterate quickly with short-form video technology, gathering valuable feedback along the way. The tools our users find most fun and useful may be integrated later on into existing products like Google Photos. Download Motion Stills for Android from the Google Play store—available for mobile phones running Android 5.1 and later—and share your favorite clips on social media with hashtag #motionstills. AcknowledgementsMotion Stills would not have been possible without the help of many Googlers. We want to especially acknowledge the work of Matthias Grundmann in advancing our stabilization technology, as well as our UX and interaction des[...]



Facets: An Open Source Visualization Tool for Machine Learning Training Data

2017-07-17T11:00:23.204-07:00

Posted by James Wexler, Senior Software Engineer, Google Big Picture Team(Cross-posted on the Google Open Source Blog)Getting the best results out of a machine learning (ML) model requires that you truly understand your data. However, ML datasets can contain hundreds of millions of data points, each consisting of hundreds (or even thousands) of features, making it nearly impossible to understand an entire dataset in an intuitive fashion. Visualization can help unlock nuances and insights in large datasets. A picture may be worth a thousand words, but an interactive visualization can be worth even more.Working with the PAIR initiative, we’ve released Facets, an open source visualization tool to aid in understanding and analyzing ML datasets. Facets consists of two visualizations that allow users to see a holistic picture of their data at different granularities. Get a sense of the shape of each feature of the data using Facets Overview, or explore a set of individual observations using Facets Dive. These visualizations allow you to debug your data which, in machine learning, is as important as debugging your model. They can easily be used inside of Jupyter notebooks or embedded into webpages. In addition to the open source code, we've also created a Facets demo website. This website allows anyone to visualize their own datasets directly in the browser without the need for any software installation or setup, without the data ever leaving your computer. Facets OverviewFacets Overview automatically gives users a quick understanding of the distribution of values across the features of their datasets. Multiple datasets, such as a training set and a test set, can be compared on the same visualization. Common data issues that can hamper machine learning are pushed to the forefront, such as: unexpected feature values, features with high percentages of missing values, features with unbalanced distributions, and feature distribution skew between datasets.Facets Overview visualization of the six numeric features of the UCI Census datasets[1]. The features are sorted by non-uniformity, with the feature with the most non-uniform distribution at the top. Numbers in red indicate possible trouble spots, in this case numeric features with a high percentage of values set to 0. The histograms at right allow you to compare the distributions between the training data (blue) and test data (orange).Facets Overview visualization showing two of the nine categorical features of the UCI Census datasets[1]. The features are sorted by distribution distance, with the feature with the biggest skew between the training (blue) and test (orange) datasets at the top. Notice in the “Target” feature that the label values differ between the training and test datasets, due to a trailing period in the test set (“<=50K” vs “<=50K.”). This can be seen in the chart for the feature and also in the entries in the “top” column of the table. This label mismatch would cause a model trained and tested on this data to not be evaluated correctly.Facets DiveFacets Dive provides an easy-to-customize, intuitive interface for exploring the relationship between the data points across the different features of a dataset. With Facets Dive, you control the position, color and visual representation of each data point based on its feature values. If the data points have images associated with them, the images can be used as the visual representations.Facets Dive visualization showing all 16281 d[...]



Using Deep Learning to Create Professional-Level Photographs

2017-07-13T18:22:54.420-07:00

Posted by Hui Fang, Software Engineer, Machine PerceptionMachine learning (ML) excels in many areas with well defined goals. Tasks where there exists a right or wrong answer help with the training process and allow the algorithm to achieve its desired goal, whether it be correctly identifying objects in images or providing a suitable translation from one language to another. However, there are areas where objective evaluations are not available. For example, whether a photograph is beautiful is measured by its aesthetic value, which is a highly subjective concept. A professional(?) photograph of Jasper National Park, Canada.To explore how ML can learn subjective concepts, we introduce an experimental deep-learning system for artistic content creation. It mimics the workflow of a professional photographer, roaming landscape panoramas from Google Street View and searching for the best composition, then carrying out various postprocessing operations to create an aesthetically pleasing image. Our virtual photographer “travelled” ~40,000 panoramas in areas like the Alps, Banff and Jasper National Parks in Canada, Big Sur in California and Yellowstone National Park, and returned with creations that are quite impressive, some even approaching professional quality — as judged by professional photographers.Training the ModelWhile aesthetics can be modelled using datasets like AVA, using it naively to enhance photos may miss some aspect in aesthetics, such as making a photo over-saturated. Using supervised learning to learn multiple aspects in aesthetics properly, however, may require a labelled dataset that is intractable to collect. Our approach relies only on a collection of professional quality photos, without before/after image pairs, or any additional labels. It breaks down aesthetics into multiple aspects automatically, each of which is learned individually with negative examples generated by a coupled image operation. By keeping these image operations semi-”orthogonal”, we can enhance a photo on its composition, saturation/HDR level and dramatic lighting with fast and separable optimizations:A panorama (a) is cropped into (b), with saturation and HDR strength enhanced in (c), and with dramatic mask applied in (d). Each step is guided by one learned aspect of aesthetics.A traditional image filter was used to generate negative training examples for saturation, HDR detail and composition. We also introduce a special operation named dramatic mask, which was created jointly while learning the concept of dramatic lighting. The negative examples were generated by applying a combination of image filters that modify brightness randomly on professional photos, degrading their appearance. For the training we use a generative adversarial network (GAN), where a generative model creates a mask to fix lighting for negative examples, while a discriminative model tries to distinguish enhanced results from the real professional ones. Unlike shape-fixed filters such as vignette, dramatic mask adds content-aware brightness adjustment to a photo. The competitive nature of GAN training leads to good variations of such suggestions. You can read more about the training details in our paper.ResultsSome creations of our system from Google Street View are shown below. As you can see, the application of the trained aesthetic filters creates some dramatic results (including the image we started this post with!):Jasper National Park, Canada.Interlaken,[...]



Building Your Own Neural Machine Translation System in TensorFlow

2017-07-12T11:30:11.004-07:00

Posted by Thang Luong, Research Scientist, and Eugene Brevdo, Staff Software Engineer, Google Brain TeamMachine translation – the task of automatically translating between languages – is one of the most active research areas in the machine learning community. Among the many approaches to machine translation, sequence-to-sequence ("seq2seq") models [1, 2] have recently enjoyed great success and have become the de facto standard in most commercial translation systems, such as Google Translate, thanks to its ability to use deep neural networks to capture sentence meanings. However, while there is an abundance of material on seq2seq models such as OpenNMT or tf-seq2seq, there is a lack of material that teaches people both the knowledge and the skills to easily build high-quality translation systems.Today we are happy to announce a new Neural Machine Translation (NMT) tutorial for TensorFlow that gives readers a full understanding of seq2seq models and shows how to build a competitive translation model from scratch. The tutorial is aimed at making the process as simple as possible, starting with some background knowledge on NMT and walking through code details to build a vanilla system. It then dives into the attention mechanism [3, 4], a key ingredient that allows NMT systems to handle long sentences. Finally, the tutorial provides details on how to replicate key features in the Google’s NMT (GNMT) system [5] to train on multiple GPUs. The tutorial also contains detailed benchmark results, which users can replicate on their own. Our models provide a strong open-source baseline with performance on par with GNMT results [5]. We achieve 24.4 BLEU points on the popular WMT’14 English-German translation task.Other benchmark results (English-Vietnamese, German-English) can be found in the tutorial.In addition, this tutorial showcases the fully dynamic seq2seq API (released with TensorFlow 1.2) aimed at making building seq2seq models clean and easy:Easily read and preprocess dynamically sized input sequences using the new input pipeline in tf.contrib.data.Use padded batching and sequence length bucketing to improve training and inference speeds.Train seq2seq models using popular architectures and training schedules, including several types of attention and scheduled sampling.Perform inference in seq2seq models using in-graph beam search.Optimize seq2seq models for multi-GPU settings.We hope this will help spur the creation of, and experimentation with, many new NMT models by the research community. To get started on your own research, check out the tutorial on GitHub!Core contributorsThang Luong, Eugene Brevdo, and Rui Zhao. AcknowledgementsWe would like to especially thank our collaborator on the NMT project, Rui Zhao. Without his tireless effort, this tutorial would not have been possible. Additional thanks go to Denny Britz, Anna Goldie, Derek Murray, and Cinjon Resnick for their work bringing new features to TensorFlow and the seq2seq library. Lastly, we thank Lukasz Kaiser for the initial help on the seq2seq codebase; Quoc Le for the suggestion to replicate GNMT; Yonghui Wu and Zhifeng Chen for details on the GNMT systems; as well as the Google Brain team for their support and feedback!References[1] Sequence to sequence learning with neural networks, Ilya Sutskever, Oriol Vinyals, and Quoc V. Le. NIPS, 2014.[2] Learning phrase representations using RNN encoder-decoder for statistical machine translati[...]



Revisiting the Unreasonable Effectiveness of Data

2017-07-27T15:40:57.139-07:00

Posted by Abhinav Gupta, Faculty Advisor, Machine PerceptionThere has been remarkable success in the field of computer vision over the past decade, much of which can be directly attributed to the application of deep learning models to this machine perception task. Furthermore, since 2012 there have been significant advances in representation capabilities of these systems due to (a) deeper models with high complexity, (b) increased computational power and (c) availability of large-scale labeled data, much of which is publicly available. And while every year we get further increases in computational power and the model complexity (from 7-layer AlexNet to 101-layer ResNet), available datasets have not scaled accordingly. A 101-layer ResNet with significantly more capacity than AlexNet is still trained with the same 1M images from ImageNet circa 2011. As researchers, we have always wondered: if we scale up the amount of training data 10x, will the accuracy double? How about 100x or maybe even 300x? Will the accuracy plateau or will we continue to see increasing gains with more and more data?While GPU computation power and model sizes have continued to increase over the last five years, the size of the largest training dataset has surprisingly remained constant.In our paper, “Revisiting Unreasonable Effectiveness of Data in Deep Learning Era”, we take the first steps towards clearing the clouds of mystery surrounding the relationship between `enormous visual data' and deep learning for computer vision. Our goal was to explore: (a) if visual representations can be still improved by feeding more and more images with noisy labels to currently existing algorithms; (b) the nature of the relationship between data and performance on standard vision tasks such as classification, object detection and image segmentation; (c) state-of-the-art models for all the tasks in computer vision using large-scale learning.Of course, the elephant in the room is where can we obtain a dataset that is 300x larger than ImageNet? At Google, we have been continuously working on building such datasets automatically to improve computer vision algorithms. Specifically, we have built an internal dataset of 300M images that are labeled with 18291 categories, which we call JFT-300M. The images are labeled using an algorithm that uses complex mixture of raw web signals, connections between web-pages and user feedback. This results in over one billion labels for the 300M images (a single image can have multiple labels). Of the billion image labels, approximately 375M are selected via an algorithm that aims to maximize label precision of selected images. However, there is still considerable noise in the labels: approximately 20% of the labels for selected images are noisy. Since there is no exhaustive annotation, we have no way to estimate the recall of the labels.Our experimental results validate some of the hypotheses but also generate some unexpected surprises:Better Representation Learning Helps. Our first observation is that large-scale data helps in representation learning which in-turn improves the performance on each vision task we study. Our findings suggest that a collective effort to build a large-scale dataset for visual pretraining is important. It also suggests a bright future for unsupervised and semi-supervised representation learning approaches. It seems the scale of data continues to overpowe[...]