Subscribe: Google Research Blog
Added By: Feedage Forager Feedage Grade A rated
Language: English
app  data  google brain  google  images  learning  machine learning  machine  neural  new  research  system  tensorflow 
Rate this Feed
Rate this feedRate this feedRate this feedRate this feedRate this feed
Rate this feed 1 starRate this feed 2 starRate this feed 3 starRate this feed 4 starRate this feed 5 star

Comments (0)

Feed Details and Statistics Feed Statistics
Preview: Google Research Blog

Google Research Blog

The latest news on Google Research.

Updated: 2017-01-19T14:09:06.222-08:00


A Large Corpus for Supervised Word-Sense Disambiguation


Posted by Colin Evans and Dayu Yuan, Software EngineersUnderstanding the various meanings of a particular word in text is key to understanding language. For example, in the sentence “he will receive stock in the reorganized company”, we know that “stock” refers to “the capital raised by a business or corporation through the issue and subscription of shares” as defined in the New Oxford American Dictionary (NOAD), based on the context. However, there are more than 10 other definitions for “stock” in NOAD, ranging from “goods in a store”to “a medieval device for punishment”. For a computer algorithm, distinguishing between these meanings is so difficult that it has been described as “AI-complete” in the past (Navigli, 2009; Ide and Veronis 1998; Mallery 1988).In order to help further progress on this challenge, we’re happy to announce the release of word-sense annotations on the popular MASC and SemCor datasets, manually annotated with senses from the NOAD. We’re also releasing mappings from the NOAD senses to English Wordnet, which is more commonly used by the research community. This is one of the largest releases of fully sense-annotated English corpora. Supervised Word-Sense DisambiguationHumans distinguish between meanings of words in text easily because we have access to an enormous amount of common-sense knowledge about how the world works, and how this connects to language. For an example of the difficulty, “[stock] in a business” implies the financial sense, but “[stock] in a bodega” is more likely to refer to goods on the shelves of a store, even though a bodega is a kind of business. Acquiring sufficient knowledge in a form that a machine can use, and then applying it to understanding the words in text, is a challenge.Supervised word-sense disambiguation (WSD) is the problem of building a machine-learned system using human-labeled data that can assign a dictionary sense to all words used in text (in contrast to entity disambiguation, which focuses on nouns, mostly proper). Building a supervised model that performs better than just assigning the most frequent sense of a word without considering the surrounding text is difficult, but supervised models can perform well when supplied with significant amounts of training data. (Navigli, 2009)By releasing this dataset, it is our hope that the research community will be able to further the advance of algorithms that allow machines to understand language better, allowing applications such as:Facilitating the automatic construction of databases from text in order to answer questions and connect knowledge in documents. For example, understanding that a “hemi engine” is a kind of automotive machinery, and a “locomotive engine” is a kind of train, or that “Kanye West is a star” implies that he is a celebrity, but “Sirius is a star” implies that it is an astronomical object.Disambiguating words in queries, so that results for “date palm” and “date night” or “web spam” and “spam recipe” can have distinct interpretations for different senses, and documents returned from a query have the same meaning that is implied by the query.Manual AnnotationIn the manually labeled data sets that we are releasing, each sense annotation is labeled by five raters. To ensure high quality of the sense annotation, raters are first trained with gold annotations, which were labeled by experienced linguists in a separate pilot study before the annotation task. The figure below shows an example of a rater’s work page in our annotation tool. The left side of the page lists all candidate dictionary senses (in this case, the word “general”). Example sentences from the dictionary are also provided. The to-be-annotated words, highlighted within a sentence, are shown on the right side of the work page. Besides linking a dictionary sense to a word, raters could also label one of the three exceptions: (1) The word is a typo (2) None of the above and (3) I can’t decide. Raters could also check whether the word usage is a m[...]

The Google Brain team — Looking Back on 2016


Posted by Jeff Dean, Google Senior Fellow, on behalf of the entire Google Brain teamThe Google Brain team's long-term goal is to create more intelligent software and systems that improve people's lives, which we pursue through both pure and applied research in a variety of different domains. And while this is obviously a long-term goal, we would like to take a step back and look at some of the progress our team has made over the past year, and share what we feel may be in store for 2017. Research PublicationsOne important way in which we assess the quality of our research is through publications in top tier international machine learning venues like ICML, NIPS, and ICLR. Last year our team had a total of 27 accepted papers at these venues, covering a wide ranging set of topics including program synthesis, knowledge transfer from one network to another, distributed training of machine learning models, generative models for language, unsupervised learning for robotics, automated theorem proving, better theoretical understanding of neural networks, algorithms for improved reinforcement learning, and many others. We also had numerous other papers accepted at conferences in fields such as natural language processing (ACL, CoNNL), speech (ICASSP), vision (CVPR), robotics (ISER), and computer systems (OSDI). Our group has also submitted 34 papers to the upcoming ICLR 2017, a top venue for cutting-edge deep learning research. You can learn more about our work in our list of papers, here.Natural Language UnderstandingAllowing computers to better understand human language is one key area for our research. In late 2014, three Brain team researchers published a paper on Sequence to Sequence Learning with Neural Networks, and demonstrated that the approach could be used for machine translation. In 2015, we showed that this this approach could also be used for generating captions for images, parsing sentences, and solving computational geometry problems. In 2016, this previous research (plus many enhancements) culminated in Brain team members worked closely with members of the Google Translate team to wholly replace the translation algorithms powering Google Translate with a completely end-to-end learned system (research paper). This new system closed the gap between the old system and human quality translations by up to 85% for some language pairs. A few weeks later, we showed how the system could do “zero-shot translation”, learning to translate between languages for which it had never seen example sentence pairs (research paper). This system is now deployed on the production Google Translate service for a growing number of language pairs, giving our users higher quality translations and allowing people to communicate more effectively across language barriers. Gideon Lewis-Kraus documented this translation effort (along with the history of deep learning and the history of the Google Brain team) in “The Great A.I. Awakening”, an in-depth article that appeared in The NY Times Magazine in December, 2016.RoboticsTraditional robotics control algorithms are carefully and painstakingly hand-programmed, and therefore embodying robots with new capabilities is often a very laborious process. We believe that having robots automatically learn to acquire new skills through machine learning is a better approach. Last year, we collaborated with researchers at [X] to demonstrate how robotic arms could learn hand-eye coordination, pooling their experiences to teach themselves more quickly (research paper). Our robots made about 800,000 grasping attempts during this research. Later in the year, we explored three possible ways for robots to learn new skills, through reinforcement learning, through their own interaction with objects, and through human demonstrations. We’re continuing to build on this work in our goals for making robots that are able to flexibly and readily learn new tasks and operate in messy, real-world environments. To help other robotics researchers, we have made multiple ro[...]

Google Brain Residency Program - 7 months in and looking ahead


Posted by Jeff Dean, Google Senior Fellow and Leslie Phillips, Google Brain Residency Program Manager“Beyond being incredibly instructive, the Google Brain Residency program has been a truly affirming experience. Working alongside people who truly love what they do--and are eager to help you develop your own passion--has vastly increased my confidence in my interests, my ability to explore them, and my plans for the near future.”-Akosua Busia, B.S. Mathematical and Computational Science, Stanford University ‘162016 Google Brain ResidentIn October 2015 we launched the Google Brain Residency, a 12-month program focused on jumpstarting a career for those interested in machine learning and deep learning research. This program is an opportunity to get hands on experience using the state-of-the-art infrastructure available at Google, and offers the chance to work alongside top researchers within the Google Brain team. Our first group of residents arrived in June 2016, working with researchers on problems at the forefront of machine learning. The wide array of topics studied by residents reflects the diversity of the residents themselves — some come to the program as new graduates with degrees ranging from BAs to Ph.Ds in computer science to physics and mathematics to biology and neuroscience, while other residents come with years of industry experience under their belts. They all have come with a passion for learning how to conduct machine learning research.The breadth of research being done by the Google Brain Team along with resident-mentorship pairing flexibility ensures that residents with interests in machine learning algorithms and reinforcement learning, natural language understanding, robotics, neuroscience, genetics and more, are able to find good mentors to help them pursue their ideas and publish interesting work. And just seven months into the program, the Residents are already making an impact in the research field. To date, Google Brain Residents have submitted a total of 21 papers to leading machine learning conferences, spanning topics from enhancing low resolution images to building neural networks that in turn design novel, task specific neural network architectures. Of those 21 papers, 5 were accepted in the recent BayLearn Conference (two of which, “Mean Field Neural Networks” and “Regularizing Neural Networks by Penalizing Their Output Distribution’’, were presented in oral sessions), 2 were accepted in the NIPS 2016 Adversarial Training workshop, and another in ISMIR 2016 (see the full list of papers, including the 14 submissions to ICLR 2017, after the figures below).An LSTM Cell (Left) and a state of the art RNN Cell found using a neural network (Right). This is an example of a novel architecture found using the approach presented in “Neural Architecture Search with Reinforcement Learning” (B. Zoph and Q. V. Le, submitted to ICLR 2017). This paper uses a neural network to generate novel RNN cell architectures that outperform the widely used LSTM on a variety of different tasks. The training accuracy for neural networks, colored from black (random chance) to red (high accuracy). Overlaid in white dashed lines are the theoretical predictions showing the boundary between trainable and untrainable networks. (a) Networks with no dropout. (b)-(d) Networks with dropout rates of 0.01, 0.02, 0.06 respectively. This research explores whether theoretical calculations can replace large hyperparameter searches. For more details, read “Deep Information Propagation” (S. S. Schoenholz, J. Gilmer, S. Ganguli, J. Sohl-Dickstein, submitted to ICLR 2017).Accepted conference papers (Google Brain Residents marked with asterisks)Unrolled Generative Adversarial NetworksLuke Metz*, Ben Poole, David Pfau, Jascha Sohl-DicksteinNIPS 2016 Adversarial Training Workshop (oral presentation)Conditional Image Synthesis with Auxiliary Classifier GANsAugustus Odena*, Chris Olah, Jon ShlensNIPS 2016 Adversarial Training Workshop (oral presentation)Regula[...]

Get moving with the new Motion Stills


Posted by Matthias Grundmann and Ken Conley, Machine PerceptionLast June, we released Motion Stills, an iOS app that uses our video stabilization technology to create easily shareable GIFs from Apple Live Photos. Since then, we integrated Motion Stills into Google Photos for iOS and thought of ways to improve it, taking into account your ideas for new features.Today, we are happy to announce a major new update to the Motion Stills app that will help you create even more beautiful videos and fun GIFs using motion-tracked text overlays, super-resolution videos, and automatic cinemagraphs.Motion TextWe’ve added motion text so you can create moving text effects, similar to what you might see in movies and TV shows, directly on your phone. With Motion Text, you can easily position text anywhere over your video to get the exact result you want. It only takes a second to initialize while you type, and a tracks at 1000 FPS throughout the whole Live Photo, so the process feels instantaneous.To make this possible, we took the motion tracking technology that we run on YouTube servers for “Privacy Blur”, and made it run even faster on your device. How? We first create motion metadata for your video by leveraging machine learning to classify foreground/background features as well as to model temporally coherent camera motion. We then take this metadata, and use it as input to an algorithm that can track individual objects while discriminating it from others. The algorithm models each object’s state that includes its motion in space, an implicit appearance model (described as a set of its moving parts), and its centroid and extent, as shown in the figure below.Enhance! your videos with better detail and loopsLast month, we published the details of our state-of-the-art RAISR technology, which employs machine learning to create super-resolution detail in images. This technology is now available in Motion Stills, automatically sharpening every video you export. We are also going beyond stabilization to bring you fully automatic cinemagraphs. After freezing the background into a still photo, we analyze our result to optimize for the perfect loop transition. By considering a range of start and end frames, we build a matrix of transition scores between frame pairs. A significant minimum in this matrix reflects the perfect transition, resulting in an endless loop of motion stillness.Continuing improve the experienceThanks to your feedback, we’ve additionally rebuilt our navigation and added more tutorials. We’ve also added Apple’s 3D touch to let you “peek and pop” clips in your stream and movie tray. Lots more is coming to address your top requests, so please download the new release of Motion Stills and keep sending us feedback with #motionstills on your favorite social media. [...]

App Discovery With Google Play, Part 2: Personalized Recommendations with Related Apps


Posted by Ananth Balashankar & Levent Koc, Software Engineers, and Norberto Guimaraes, Product ManagerIn Part 1 of this series on app discovery, we discussed using machine learning to gain a deeper understanding of the topics associated with an app, in order to provide a better search and discovery experience on the Google Play Apps Store. In this post, we discuss a deep learning framework to provide personalized recommendations to users based on their previous app downloads and the context in which they are used. Providing useful and relevant app recommendations to visitors of the Google Play Apps Store is a key goal of our apps discovery team. An understanding of the topics associated with an app, however, is only one part of creating a system that best serves the user. In order to create a better overall experience, one must also take into account the tastes of the user and provide personalized recommendations. If one didn’t, the “You might also like” recommendation would look the same for everyone! Discovering these nuances requires both an understanding what an app does, and also the context of the app with respect to the user. For example, to an avid sci-fi gamer, similar game recommendations may be of interest, but if a user installs a fitness app, recommending a health recipe app may be more relevant than five more fitness apps. As users may be more interested in downloading an app or game that complements one they already have installed, we provide recommendations based on app relatedness with each other (“You might also like”), in addition to providing recommendations based on the topic associated with an app (“Similar apps”). Suggestions of similar apps and apps that you also might like shown both before making an install decision (left) and while the current install is in progress (right).One particularly strong contextual signal is app relatedness, based on previous installs and search query clicks. As an example, a user who has searched for and plays a lot of graphics-heavy games likely has a preference for apps which are also graphically intense rather than apps with simpler graphics. So, when this user installs a car racing game, the “You might also like” suggestions includes apps which relate to the “seed” app (because they are graphically intense racing games) ranked higher than racing apps with simpler graphics. This allows for a finer level of personalization where the characteristics of the apps are matched with the preferences of the user.To incorporate this app relatedness in our recommendations, we take a two pronged approach: (a) offline candidate generation i.e. the generation of the potential related apps that other users have downloaded, in addition to the app in question, and (b) online personalized re-ranking, where we re-rank these candidates using a personalized ML model.Offline Candidate GenerationThe problem of finding related apps can be formulated as a nearest neighbor search problem. Given an app X, we want to find the k nearest apps. In the case of “you might also like”, a naive approach would be one based on counting, where if many people installed apps X and Y, then the app Y would be used as candidate for seed app X. However, this approach is intractable as it is difficult to learn and generalize effectively in the huge problem space. Given that there are over a million apps on Google Play, the total number of possible app pairs is over ~1012. To solve this, we trained a deep neural network to predict the next app installed by the user given their previous installs. Output embeddings at the final layer of this deep neural network generally represents the types of apps a given user has installed. We then apply the nearest neighbor algorithm to find related apps for a given seed app in the trained embedding space. Thus, we perform dimensionality reduction by representing apps using embeddings to help prune the space of potential candidates.Onl[...]

Open sourcing the Embedding Projector: a tool for visualizing high dimensional data


Posted by Daniel Smilkov and the Big Picture group Recent advances in Machine Learning (ML) have shown impressive results, with applications ranging from image recognition, language translation, medical diagnosis and more. With the widespread adoption of ML systems, it is increasingly important for research scientists to be able to explore how the data is being interpreted by the models. However, one of the main challenges in exploring this data is that it often has hundreds or even thousands of dimensions, requiring special tools to investigate the space. To enable a more intuitive exploration process, we are open-sourcing the Embedding Projector, a web application for interactive visualization and analysis of high-dimensional data recently shown as an A.I. Experiment, as part of TensorFlow. We are also releasing a standalone version at, where users can visualize their high-dimensional data without the need to install and run TensorFlow.Exploring EmbeddingsThe data needed to train machine learning systems comes in a form that computers don't immediately understand. To translate the things we understand naturally (e.g. words, sounds, or videos) to a form that the algorithms can process, we use embeddings, a mathematical vector representation that captures different facets (dimensions) of the data. For example, in this language embedding, similar words are mapped to points that are close to each other.With the Embedding Projector, you can navigate through views of data in either a 2D or a 3D mode, zooming, rotating, and panning using natural click-and-drag gestures. Below is a figure showing the nearest points to the embedding for the word “important” after training a TensorFlow model using the word2vec tutorial. Clicking on any point (which represents the learned embedding for a given word) in this visualization, brings up a list of nearest points and distances, which shows which words the algorithm has learned to be semantically related. This type of interaction represents an important way in which one can explore how an algorithm is performing. Methods of Dimensionality ReductionThe Embedding Projector offers three commonly used methods of data dimensionality reduction, which allow easier visualization of complex data: PCA, t-SNE and custom linear projections. PCA is often effective at exploring the internal structure of the embeddings, revealing the most influential dimensions in the data. t-SNE, on the other hand, is useful for exploring local neighborhoods and finding clusters, allowing developers to make sure that an embedding preserves the meaning in the data (e.g. in the MNIST dataset, seeing that the same digits are clustered together). Finally, custom linear projections can help discover meaningful "directions" in data sets - such as the distinction between a formal and casual tone in a language generation model - which would allow the design of more adaptable ML systems.A custom linear projection of the 100 nearest points of "See attachments." onto the "yes" - "yeah" vector (“yes” is right, “yeah” is left) of a corpus of 35k frequently used phrases in emailsThe Embedding Projector website includes a few datasets to play with. We’ve also made it easy for users to publish and share their embeddings with others (just click on the “Publish” button on the left pane). It is our hope that the Embedding Projector will be a useful tool to help the research community explore and refine their ML applications, as well as enable anyone to better understand how ML algorithms interpret data. If you'd like to get the full details on the Embedding Projector, you can read the paper here. Have fun exploring the world of embeddings! [...]

NIPS 2016 & Research at Google


Posted by Doug Eck, Research Scientist, Google Brain TeamThis week, Barcelona hosts the 30th Annual Conference on Neural Information Processing Systems (NIPS 2016), a machine learning and computational neuroscience conference that includes invited talks, demonstrations and oral and poster presentations of some of the latest in machine learning research. Google will have a strong presence at NIPS 2016, with over 280 Googlers attending in order to contribute to and learn from the broader academic research community by presenting technical talks and posters, in addition to hosting workshops and tutorials.Research at Google is at the forefront of innovation in Machine Intelligence, actively exploring virtually all aspects of machine learning including classical algorithms as well as cutting-edge techniques such as deep learning. Focusing on both theory as well as application, much of our work on language understanding, speech, translation, visual processing, ranking, and prediction relies on Machine Intelligence. In all of those tasks and many others, we gather large volumes of direct or indirect evidence of relationships of interest, and develop learning approaches to understand and generalize. If you are attending NIPS 2016, we hope you’ll stop by our booth and chat with our researchers about the projects and opportunities at Google that go into solving interesting problems for billions of people, and to see demonstrations of some of the exciting research we pursue. You can also learn more about our work being presented at NIPS 2016 in the list below (Googlers highlighted in blue).Google is a Platinum Sponsor of NIPS 2016.Organizing CommitteeExecutive Board includes: Corinna Cortes, Fernando PereiraAdvisory Board includes: John C. PlattArea Chairs include: John Shlens, Moritz Hardt, Navdeep Jaitly, Hugo Larochelle, Honglak Lee, Sanjiv Kumar, Gal ChechikInvited TalkDynamic Legged RobotsMarc RaibertAccepted Papers:Boosting with AbstentionCorinna Cortes, Giulia DeSalvo, Mehryar MohriCommunity Detection on Evolving GraphsStefano Leonardi, Aris Anagnostopoulos, Jakub Łącki, Silvio Lattanzi, Mohammad MahdianLinear Relaxations for Finding Diverse Elements in Metric SpacesAditya Bhaskara, Mehrdad Ghadiri, Vahab Mirrokni, Ola SvenssonNearly Isometric Embedding by RelaxationJames McQueen, Marina Meila, Dominique Joncas Optimistic Bandit Convex OptimizationMehryar Mohri, Scott YangReward Augmented Maximum Likelihood for Neural Structured PredictionMohammad Norouzi, Samy Bengio, Zhifeng Chen, Navdeep Jaitly, Mike Schuster, Yonghui Wu, Dale Schuurmans Stochastic Gradient MCMC with Stale GradientsChangyou Chen, Nan Ding, Chunyuan Li, Yizhe Zhang, Lawrence CarinUnsupervised Learning for Physical Interaction through Video PredictionChelsea Finn*, Ian Goodfellow, Sergey LevineUsing Fast Weights to Attend to the Recent PastJimmy Ba, Geoffrey Hinton, Volodymyr Mnih, Joel Leibo, Catalin IonescuA Credit Assignment Compiler for Joint PredictionKai-Wei Chang, He He, Stephane Ross, Hal IIIA Neural TransducerNavdeep Jaitly, Quoc Le, Oriol Vinyals, Ilya Sutskever, David Sussillo, Samy BengioAttend, Infer, Repeat: Fast Scene Understanding with Generative ModelsS. M. Ali Eslami, Nicolas Heess, Theophane Weber, Yuval Tassa, David Szepesvari, Koray Kavukcuoglu, Geoffrey HintonBi-Objective Online Matching and Submodular AllocationsHossein Esfandiari, Nitish Korula, Vahab MirrokniCombinatorial Energy Learning for Image SegmentationJeremy Maitin-Shepard, Viren Jain, Michal Januszewski, Peter Li, Pieter AbbeelDeep Learning GamesDale Schuurmans, Martin ZinkevichDeepMath - Deep Sequence Models for Premise SelectionGeoffrey Irving, Christian Szegedy, Niklas Een, Alexander Alemi, François Chollet, Josef UrbanDensity Estimation via Discrepancy Based Adaptive Sequential PartitionDangna Li, Kun Yang, Win[...]

Deep Learning for Detection of Diabetic Eye Disease


Posted by Lily Peng MD PhD, Product Manager and Varun Gulshan PhD, Research EngineerDiabetic retinopathy (DR) is the fastest growing cause of blindness, with nearly 415 million diabetic patients at risk worldwide. If caught early, the disease can be treated; if not, it can lead to irreversible blindness. Unfortunately, medical specialists capable of detecting the disease are not available in many parts of the world where diabetes is prevalent. We believe that Machine Learning can help doctors identify patients in need, particularly among underserved populations.A few years ago, several of us began wondering if there was a way Google technologies could improve the DR screening process, specifically by taking advantage of recent advances in Machine Learning and Computer Vision. In "Development and Validation of a Deep Learning Algorithm for Detection of Diabetic Retinopathy in Retinal Fundus Photographs", published today in JAMA, we present a deep learning algorithm capable of interpreting signs of DR in retinal photographs, potentially helping doctors screen more patients in settings with limited resources. One of the most common ways to detect diabetic eye disease is to have a specialist examine pictures of the back of the eye (Figure 1) and rate them for disease presence and severity. Severity is determined by the type of lesions present (e.g. microaneurysms, hemorrhages, hard exudates, etc), which are indicative of bleeding and fluid leakage in the eye. Interpreting these photographs requires specialized training, and in many regions of the world there aren’t enough qualified graders to screen everyone who is at risk. Figure 1. Examples of retinal fundus photographs that are taken to screen for DR. The image on the left is of a healthy retina (A), whereas the image on the right is a retina with referable diabetic retinopathy (B) due a number of hemorrhages (red spots) present.Working closely with doctors both in India and the US, we created a development dataset of 128,000 images which were each evaluated by 3-7 ophthalmologists from a panel of 54 ophthalmologists. This dataset was used to train a deep neural network to detect referable diabetic retinopathy. We then tested the algorithm’s performance on two separate clinical validation sets totalling ~12,000 images, with the majority decision of a panel 7 or 8 U.S. board-certified ophthalmologists serving as the reference standard. The ophthalmologists selected for the validation sets were the ones that showed high consistency from the original group of 54 doctors.Performance of both the algorithm and the ophthalmologists on a 9,963-image validation set are shown in Figure 2. Figure 2. Performance of the algorithm (black curve) and eight ophthalmologists (colored dots) for the presence of referable diabetic retinopathy (moderate or worse diabetic retinopathy or referable diabetic macular edema) on a validation set consisting of 9963 images. The black diamonds on the graph correspond to the sensitivity and specificity of the algorithm at the high sensitivity and high specificity operating points. The results show that our algorithm’s performance is on-par with that of ophthalmologists. For example, on the validation set described in Figure 2, the algorithm has a F-score (combined sensitivity and specificity metric, with max=1) of 0.95, which is slightly better than the median F-score of the 8 ophthalmologists we consulted (measured at 0.91).These are exciting results, but there is still a lot of work to do. First, while the conventional quality measures we used to assess our algorithm are encouraging, we are working with retinal specialists to define even more robust reference standards that can be used to quantify performance. Furthermore, interpretation of a 2D fundus photograph, which we demonstrate in this paper, is only one part in a multi-step process that leads to a d[...]

Zero-Shot Translation with Google’s Multilingual Neural Machine Translation System


Posted by Mike Schuster (Google Brain Team), Melvin Johnson (Google Translate) and Nikhil Thorat (Google Brain Team)In the last 10 years, Google Translate has grown from supporting just a few languages to 103, translating over 140 billion words every day. To make this possible, we needed to build and maintain many different systems in order to translate between any two languages, incurring significant computational cost. With neural networks reforming many fields, we were convinced we could raise the translation quality further, but doing so would mean rethinking the technology behind Google Translate.In September, we announced that Google Translate is switching to a new system called Google Neural Machine Translation (GNMT), an end-to-end learning framework that learns from millions of examples, and provided significant improvements in translation quality. However, while switching to GNMT improved the quality for the languages we tested it on, scaling up to all the 103 supported languages presented a significant challenge.In “Google’s Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation”, we address this challenge by extending our previous GNMT system, allowing for a single system to translate between multiple languages. Our proposed architecture requires no change in the base GNMT system, but instead uses an additional “token” at the beginning of the input sentence to specify the required target language to translate to. In addition to improving translation quality, our method also enables “Zero-Shot Translation” — translation between language pairs never seen explicitly by the system.Here’s how it works. Let’s say we train a multilingual system with Japanese⇄English and Korean⇄English examples, shown by the solid blue lines in the animation. Our multilingual system, with the same size as a single GNMT system, shares its parameters to translate between these four different language pairs. This sharing enables the system to transfer the “translation knowledge” from one language pair to the others. This transfer learning and the need to translate between multiple languages forces the system to better use its modeling power.This inspired us to ask the following question: Can we translate between a language pair which the system has never seen before? An example of this would be translations between Korean and Japanese where Korean⇄Japanese examples were not shown to the system. Impressively, the answer is yes — it can generate reasonable Korean⇄Japanese translations, even though it has never been taught to do so. We call this “zero-shot” translation, shown by the yellow dotted lines in the animation. To the best of our knowledge, this is the first time this type of transfer learning has worked in Machine Translation. The success of the zero-shot translation raises another important question: Is the system learning a common representation in which sentences with the same meaning are represented in similar ways regardless of language — i.e. an “interlingua”? Using a 3-dimensional representation of internal network data, we were able to take a peek into the system as it translates a set of sentences between all possible pairs of the Japanese, Korean, and English languages.Part (a) from the figure above shows an overall geometry of these translations. The points in this view are colored by the meaning; a sentence translated from English to Korean with the same meaning as a sentence translated from Japanese to English share the same color. From this view we can see distinct groupings of points, each with their own color. Part (b) zooms in to one of the groups, and part (c) colors by the source language. Within a single group, we see a sentence with the same meaning but from three different languages. This means the network must be encoding som[...]

Enhance! RAISR Sharp Images with Machine Learning


Posted by Peyman Milanfar, Research ScientistEveryday the web is used to share and store millions of pictures, enabling one to explore the world, research new topics of interest, or even share a vacation with friends and family. However, many of these images are either limited by the resolution of the device used to take the picture, or purposely degraded in order to accommodate the constraints of cell phones, tablets, or the networks to which they are connected. With the ubiquity of high-resolution displays for home and mobile devices, the demand for high-quality versions of low-resolution images, quickly viewable and shareable from a wide variety of devices, has never been greater. With “RAISR: Rapid and Accurate Image Super-Resolution”, we introduce a technique that incorporates machine learning in order to produce high-quality versions of low-resolution images. RAISR produces results that are comparable to or better than the currently available super-resolution methods, and does so roughly 10 to 100 times faster, allowing it to be run on a typical mobile device in real-time. Furthermore, our technique is able to avoid recreating the aliasing artifacts that may exist in the lower resolution image. Upsampling, the process of producing an image of larger size with significantly more pixels and higher image quality from a low quality image, has been around for quite a while. Well-known approaches to upsampling are linear methods which fill in new pixel values using simple, and fixed, combinations of the nearby existing pixel values. These methods are fast because they are fixed linear filters (a constant convolution kernel applied uniformly across the image). But what makes these upsampling methods fast, also makes them ineffective in bringing out vivid details in the higher resolution results. As you can see in the example below, the upsampled image looks blurry – one would hesitate to call it enhanced. Left: Low-res original, Right: simple (bicubic) upsampled version (2x). Image Credit: Masa Ushioda/Seapics/Solent NewsWith RAISR, we instead use machine learning and train on pairs of images, one low quality, one high, to find filters that, when applied to selectively to each pixel of the low-res image, will recreate details that are of comparable quality to the original. RAISR can be trained in two ways. The first is the "direct" method, where filters are learned directly from low and high-resolution image pairs. The other method involves first applying a computationally cheap upsampler to the low resolution image (as in the figure above) and then learning the filters from the upsampled and high resolution image pairs. While the direct method is computationally faster, the 2nd method allows for non-integer scale factors and better leveraging of hardware-based upsampling.For either method, RAISR filters are trained according to edge features found in small patches of images, - brightness/color gradients, flat/textured regions, etc. - characterized by direction (the angle of an edge), strength (sharp edges have a greater strength) and coherence (a measure of how directional the edge is). Below is a set of RAISR filters, learned from a database of 10,000 high and low resolution image pairs (where the low-res images were first upsampled). The training process takes about an hour. Collection of learned 11x11 filters for 3x super-resolution. Filters can be learned for a range of super-resolution factors, including fractional ones. Note that as the angle of the edge changes, we see the angle of the filter rotate as well. Similarly, as the strength increases, the sharpness of the filters increases, and the anisotropy of the filter increases with rising coherence.From left to right, we see that the learned filters correspond selectively to the direction of the underlying edge tha[...]

Open Source Visualization of GPS Displacements for Earthquake Cycle Physics


Posted by Jimbo Wilson, Software Engineer, Google Big Picture Team and Brendan Meade, Professor, Harvard Department of Earth and Planetary SciencesThe Earth’s surface is moving, ever so slightly, all the time. This slow, small, but persistent movement of the Earth's crust is responsible for the formation of mountain ranges, sudden earthquakes, and even the positions of the continents. Scientists around the world measure these almost imperceptible movements using arrays of Global Navigation Satellite System (GNSS) receivers to better understand all phases of an earthquake cycle—both how the surface responds after an earthquake, and the storage of strain energy between earthquakes.To help researchers explore this data and better understand the Earthquake cycle, we are releasing a new, interactive data visualization which draws geodetic velocity lines on top of a relief map by amplifying position estimates relative to their true positions. Unlike existing approaches, which focus on small time slices or individual stations, our visualization can show all the data for a whole array of stations at once. Open sourced under an Apache 2 license, and available on GitHub, this visualization technique is a collaboration between Harvard’s Department of Earth and Planetary Sciences and Google's Machine Perception and Big Picture teams.Our approach helps scientists quickly assess deformations across all phases of the earthquake cycle—both during earthquakes (coseismic) and the time between (interseismic). For example, we can see azimuth (direction) reversals of stations as they relate to topographic structures and active faults. Digging into these movements will help scientists vet their models and their data, both of which are crucial for developing accurate computer representations that may help predict future earthquakes.Classical approaches to visualizing these data have fallen into two general categories: 1) a map view of velocity/displacement vectors over a fixed time interval and 2) time versus position plots of each GNSS component (longitude, latitude and altitude). Examples of classical approaches. On the left is a map view showing average velocity vectors over the period from 1997 to 2001[1]. On the right you can see a time versus eastward (longitudinal) position plot for a single station.Each of these approaches have proved to be informative ways to understand the spatial distribution of crustal movements and the time evolution of solid earth deformation. However, because geodetic shifts happen in almost imperceptible distances (mm) and over long timescales, both approaches can only show a small subset of the data at any time—a condensed average velocity per station, or a detailed view of a single station, respectively. Our visualization enables a scientist to see all the data at once, then interactively drill down to a specific subset of interest.Our visualization approach is straightforward; by magnifying the daily longitude and latitude position changes, we show tracks of the evolution of the position of each station. These magnified position tracks are shown as trails on top of a shaded relief topography to provide a sense of position evolution in geographic context.To see how it works in practice, let’s step through an an example. Consider this tiny set of longitude/latitude pairs for a single GNSS station, with the differing digits shown in bold: Day Index Longitude Latitude 0 139.06990407 34.949757897 1 139.06990400 34.949757882 2 139.06990413 34.949757941 3 139.06990409 34.949757921 4 139.06990413 34.949757904 If we were to draw line segments between these points directly on a map, they’d be much too small to see at any reasonable scale. So we take these minute differences and multiply them by a user-controlled scaling factor. [...]

Celebrating TensorFlow’s First Year


Posted by Zak Stone, Product Manager for TensorFlow, on behalf of the TensorFlow team (Cross-posted on the Google Open Source Blog & Google Developers Blog)It has been an eventful year since the Google Brain Team open-sourced TensorFlow to accelerate machine learning research and make technology work better for everyone. There has been an amazing amount of activity around the project: more than 480 people have contributed directly to TensorFlow, including Googlers, external researchers, independent programmers, students, and senior developers at other large companies. TensorFlow is now the most popular machine learning project on GitHub.With more than 10,000 commits in just twelve months, we’ve made numerous performance improvements, added support for distributed training, brought TensorFlow to iOS and Raspberry Pi, and integrated TensorFlow with widely-used big data infrastructure. We’ve also made TensorFlow accessible from Go, Rust and Haskell, released state-of-the-art image classification models, and answered thousands of questions on GitHub, StackOverflow and the TensorFlow mailing list along the way.At Google, TensorFlow supports everything from large-scale product features to exploratory research. We recently launched major improvements to Google Translate using TensorFlow (and Tensor Processing Units, which are special hardware accelerators for TensorFlow). Project Magenta is working on new reinforcement learning-based models that can produce melodies, and a visiting PhD student recently worked with the Google Brain team to build a TensorFlow model that can automatically interpolate between artistic styles. DeepMind has also decided to use TensorFlow to power all of their research – for example, they recently produced fascinating generative models of speech and music based on raw audio.We’re especially excited to see how people all over the world are using TensorFlow. For example:Australian marine biologists are using TensorFlow to find sea cows in tens of thousands of hi-res photos to better understand their populations, which are under threat of extinction.An enterprising Japanese cucumber farmer trained a model with TensorFlow to sort cucumbers by size, shape, and other characteristics.Radiologists have adapted TensorFlow to identify signs of Parkinson’s disease in medical scans.Data scientists in the Bay Area have rigged up TensorFlow and the Raspberry Pi to keep track of the Caltrain.We’re committed to making sure TensorFlow scales all the way from research to production and from the tiniest Raspberry Pi all the way up to server farms filled with GPUs or TPUs. But TensorFlow is more than a single open-source project – we’re doing our best to foster an open-source ecosystem of related software and machine learning models around it:The TensorFlow Serving project simplifies the process of serving TensorFlow models in production.TensorFlow “Wide and Deep” models combine the strengths of traditional linear models and modern deep neural networks.For those who are interested in working with TensorFlow in the cloud, Google Cloud Platform recently launched Cloud Machine Learning, which offers TensorFlow as a managed service.Furthermore, TensorFlow’s repository of models continues to grow with contributions from the community, with more than 3000 TensorFlow-related repositories listed on GitHub alone! To participate in the TensorFlow community, you can follow our new Twitter account (@tensorflow), find us on GitHub, ask and answer questions on StackOverflow, and join the community discussion list.Thanks very much to all of you who have already adopted TensorFlow in your cutting-edge products, your ambitious research, your fast-growing startups, and your school projects; special thanks to ever[...]

App Discovery With Google Play, Part 1: Understanding Topics


Posted by Malay Haldar, Matt MacMahon, Neha Jha and Raj Arasu, Software EngineersEvery month, more than a billion users come to Google Play to download apps for their mobile devices. While some are looking for specific apps, like Snapchat, others come with only a broad notion of what they are interested in, like “horror games” or “selfie apps”. These broad searches by topic represent nearly half of the queries in Play Store, so it’s critical to find the most relevant apps. Searches by topic require more than simply indexing apps by query terms; they require an understanding of the topics associated with an app. Machine learning approaches have been applied to similar problems, but success heavily depends on the number of training examples to learn about a topic. While for some popular topics such as “social networking” we had many labeled apps to learn from, the majority of topics had only a handful of examples. Our challenge was to learn from a very limited number of training examples and scale to millions of apps across thousands of topics, forcing us to adapt our machine learning techniques. Our initial attempt was to build a deep neural network (DNN) trained to predict topics for an app based on words and phrases from the app title and description. For example, if the app description mentioned “frightening”, “very scary”, and “fear” then associate the “horror game” topic with it. However, given the learning capacity of DNNs, it completely “memorized” the topics for the apps in our small training data and failed to generalize to new apps it hadn’t seen before.To generalize effectively, we needed a much larger dataset to train on, so we turned to how people learn as inspiration. In contrast to DNNs, human beings need much less training data. For example, you would likely need to see very few “horror game” app descriptions before learning how to generalize and associate new apps to that genre. Just by knowing the language describing the apps, people can correctly infer topics from even a few examples.To emulate this, we tried a very rough approximation of this language-centric learning. We trained a neural network to learn how language was used to describe apps. We built a Skip-gram model, where the neural network attempts to predict the words around a given word, for example “share” given “photo”. The neural network encodes its knowledge as vectors of floating point numbers, referred to as embeddings. These embeddings were used to train another model called a classifier, capable of distinguishing which topics applied to an app. We now needed much less training data to learn about app topics, due to the large amount of learning already done with Skip-gram.While this architecture generalized well for popular topics like “social networking”, we ran into a new problem for more niche topics like “selfie”. The single classifier built to predict all the topics together focused most of its learning on the popular topics, ignoring the errors it made on the less common ones. To solve this problem we built a separate classifier for each topic and tuned them in isolation.This architecture produced reasonable results, but would still sometimes overgeneralize. For instance, it might associate Facebook with “dating” or Plants vs Zombies with “educational games”. To produce more precise classifiers, we needed higher volume and quality of training data. We treated the system described above as a coarse classifier that pruned down every possible {app, topic} pair, numbering in billions, to a more manageable list of {app, topic} pairs of interest. We built a pipeline to have human raters evaluate the classifier output and fed consensus results back as [...]

Research suggestions at your fingertips with Explore in Docs


Posted by Kishore Papineni, Research Scientist, Google Research NYEnabling easy access to vast amounts of information across multiple languages and modalities (from text to images to video), computers have become highly influential tools for learning, allowing you to use the world’s information to aid you with your research. However, when researching a topic or writing a term paper, gathering all the information you need from a variety of sources on the Internet can be time-consuming, and at times, a distraction from the writing process. That’s why we developed algorithms for Explore in Docs, a collaboration between the Coauthor and Apps teams that uses powerful Google infrastructure, best-in-class information retrieval, machine learning, and machine translation technologies to assemble the relevant information and sources for a research paper, all within the document. Explore in Docs suggests relevant content—in the form of topics, images, and snippets —based on the content of the document, allowing the user to focus on critical thinking and idea development.More than just a SearchSuggesting material that is relevant to the content in a Google Doc is a difficult problem. A naive approach would be to consider the content of a document as a Search query. However, search engines are not designed to accept large blocks of text as queries, so they might truncate the query or focus on the wrong words. So the challenge becomes not only identifying relevant search terms based on the overall content of the document, but additionally providing related topics that may be useful. To tackle this, the Coauthor team built algorithms that are able to associate external content with topics - entities, abstract concepts - in a document and assign relative importance to each of them. This is accomplished by creating a “target” in a topic vector space that incorporates not only the topics you are writing about but also related topics, creating a variety of search terms that include both. Then, each returned search result (piece of text, image, etc) is embedded in the same vector space and the closest items in that vector space are suggested to the user. For example, if you’re writing about monarch butterflies, our algorithms find that monarch butterfly and milkweed plant are related to each other. This is done by analyzing the statistics of discourse on the web, collected from hundreds of billions of sentences from billions of webpages across dozens of languages. Note that these two are not semantically close (an insect versus a plant). An example of a set of learned relations is below:The connection between concepts related to "monarch butterfly", with the thickness of the lines representing the strength of connection, as determined by analysis of discourse on the web. Because this is a discourse graph and not a concept/classification hierarchy, this analysis indicates that "Butterflies & moth" and "Monarch butterfly" are not discussed together as often as "monarch butterfly" and "milkweed".And because we take the entire document into account while constructing the search request and scoring each candidate piece of text, the resulting suggestions are typically different and more varied than the search snippets users would see if they search the web for each topic individually. By eliminating the need to switch tabs to search, and additionally suggesting new, related topics based on discourse on the web, Explore provides opportunities for learning that users might not discover otherwise - all from the Doc that they’re currently working in! The information you need, in multiple languagesCross-lingual predictive search is another key aspect of what we have designed and bu[...]

Supercharging Style Transfer


Posted by Vincent Dumoulin*, Jonathon Shlens and Manjunath Kudlur, Google Brain TeamPastiche. A French word, it designates a work of art that imitates the style of another one (not to be confused with its more humorous Greek cousin, parody). Although it has been used for a long time in visual art, music and literature, pastiche has been getting mass attention lately with online forums dedicated to images that have been modified to be in the style of famous paintings. Using a technique known as style transfer, these images are generated by phone or web apps that allow a user to render their favorite picture in the style of a well known work of art.Although users have already produced gorgeous pastiches using the current technology, we feel that it could be made even more engaging. Right now, each painting is its own island, so to speak: the user provides a content image, selects an artistic style and gets a pastiche back. But what if one could combine many different styles, exploring unique mixtures of well known artists to create an entirely unique pastiche?Learning a representation for artistic styleIn our recent paper titled “A Learned Representation for Artistic Style”, we introduce a simple method to allow a single deep convolutional style transfer network to learn multiple styles at the same time. The network, having learned multiple styles, is able to do style interpolation, where the pastiche varies smoothly from one style to another. Our method enables style interpolation in real-time as well, allowing this to be applied not only to static images, but also videos. allowfullscreen="" class="YOUTUBE-iframe-video" data-thumbnail-src="" frameborder="0" height="360" src="" width="640">Credit: awesome dog role played by Google Brain team office dog Picabo.In the video above, multiple styles are combined in real-time and the resulting style is applied using a single style transfer network. The user is provided with a set of 13 different painting styles and adjusts their relative strengths in the final style via sliders. In this demonstration, the user is an active participant in producing the pastiche.A Quick History of Style TransferWhile transferring the style of one image to another has existed for nearly 15 years [1] [2], leveraging neural networks to accomplish it is both very recent and very fascinating. In “A Neural Algorithm of Artistic Style” [3], researchers Gatys, Ecker & Bethge introduced a method that uses deep convolutional neural network (CNN) classifiers. The pastiche image is found via optimization: the algorithm looks for an image which elicits the same kind of activations in the CNN’s lower layers - which capture the overall rough aesthetic of the style input (broad brushstrokes, cubist patterns, etc.) - yet produces activations in the higher layers - which capture the things that make the subject recognizable - that are close to those produced by the content image. From some starting point (e.g. random noise, or the content image itself), the pastiche image is progressively refined until these requirements are met.Content image: The Tübingen Neckarfront by Andreas Praefcke, Style painting: “Head of a Clown”, by Georges Rouault.The pastiches produced via this algorithm look spectacular:Figure adapted from L. Gatys et al. "A Neural Algorithm of Artistic Style" (2015). This work is considered a breakthrough in the field of deep learning research because it provided the first proof of concept for neural network-based style transfer. Unfortunately this method for stylizing an individual image[...]

Course Builder now supports scheduling, easier customization and more


Posted by Adam Feldman, Product Manager and Pavel Simakov, Technical Lead, Course Builder TeamOver the years, we've learned that there are as many ways to run an online course as there are instructors to run them. Today's release of Course Builder v1.11 has a focus on improved student access controls, easier visual customization and a new course explorer. Additionally, we've added better support for deploying from Windows!Improved student access controlsA course's availability is often dynamic - sometimes you want to make a course available to everyone all at once, while other times may call for the course to be available to some students before others. Perhaps registration will be available for a while and then the course later becomes read-only. To support these use cases, we've added Student Groups and Calendar Triggers.Student Groups allow you to define which students can see which parts of a course. Want your morning class to see unit 5 and your afternoon class to see unit 6 -- while letting random Internet visitors only see unit 1? Student groups have you covered.Calendar Triggers can be used to update course or content availability automatically at a specific time. For instance, if your course goes live at midnight on Sunday night, you don't need to be at a computer to make it happen. Or, if you want to unlock a new unit every week, you can set up a trigger to automate the process. Read more about calendar triggers and availability.You can even use these features together. Say you want to start a new group of students through the course every month, giving each access to one new unit per week. Using Student Groups and Calendar Triggers together, you can achieve this cohort-like functionality.Easier visual customizationIn the past, if you wanted to customize Course Builder's student experience beyond a certain point, you needed to be a Python developer. We heard from many web developers that they would like to be able to create their own student-facing pages, too. With this release, Course Builder includes a GraphQL server that allows you to create your own frontend experience, while still letting Course Builder take care of things like user sessions and statefulness.New course explorerLarge Course Builder partners such as Google's Digital Workshop and NPTEL have many courses and students with diverse needs. To help them, we've completely revamped the Course Explorer page, giving it richer information and interactivity, so your students can find which of your courses they're looking for. You can provide categories and start/end dates, in addition to the course title, abstract and instructor information.In v1.11, we've added several new highly requested features. Together, they help make Course Builder easier to use and customize, giving you the flexibility to schedule things in advance.We've come a long way since releasing our first experimental code over 4 years ago, turning Course Builder into a large open-source Google App Engine application with over 5 million student registrations across all Course Builder users. With these latest additions, we consider Course Builder feature complete and fully capable of delivering online learning at any scale. We will continue to provide support and bug fixes for those using the platform.We hope you’ll enjoy these new features and share how you’re using them in the forum. Keep on learning! [...]

Equality of Opportunity in Machine Learning


Posted by Moritz Hardt, Research Scientist, Google Brain TeamAs machine learning technology progresses rapidly, there is much interest in understanding its societal impact. A particularly successful branch of machine learning is supervised learning. With enough past data and computational resources, learning algorithms often produce surprisingly effective predictors of future events. To take one hypothetical example: an algorithm could, for example, be used to predict with high accuracy who will pay back their loan. Lenders might then use such a predictor as an aid in deciding who should receive a loan in the first place. Decisions based on machine learning can be both incredibly useful and have a profound impact on our lives.Even the best predictors make mistakes. Although machine learning aims to minimize the chance of a mistake, how do we prevent certain groups from experiencing a disproportionate share of these mistakes? Consider the case of a group that we have relatively little data on and whose characteristics differ from those of the general population in ways that are relevant to the prediction task. As prediction accuracy is generally correlated with the amount of data available for training, it is likely that incorrect predictions will be more common in this group. A predictor might, for example, end up flagging too many individuals in this group as ‘high risk of default’ even though they pay back their loan. When group membership coincides with a sensitive attribute, such as race, gender, disability, or religion, this situation can lead to unjust or prejudicial outcomes.Despite the need, a vetted methodology in machine learning for preventing this kind of discrimination based on sensitive attributes has been lacking. A naive approach might require a set of sensitive attributes to be removed from the data before doing anything else with it. This idea of “fairness through unawareness,” however, fails due to the existence of “redundant encodings.” Even if a particular attribute is not present in the data, combinations of other attributes can act as a proxy.Another common approach, called demographic parity, asks that the prediction must be uncorrelated with the sensitive attribute. This might sound intuitively desirable, but the outcome itself is often correlated with the sensitive attribute. For example, the incidence of heart failure is substantially more common in men than in women. When predicting such a medical condition, it is therefore neither realistic nor desirable to prevent all correlation between the predicted outcome and group membership.Equal Opportunity Taking these conceptual difficulties into account, we’ve proposed a methodology for measuring and preventing discrimination based on a set of sensitive attributes. Our framework not only helps to scrutinize predictors to discover possible concerns. We also show how to adjust a given predictor so as to strike a better tradeoff between classification accuracy and non-discrimination if need be.At the heart of our approach is the idea that individuals who qualify for a desirable outcome should have an equal chance of being correctly classified for this outcome. In our fictional loan example, it means the rate of ‘low risk’ predictions among people who actually pay back their loan should not depend on a sensitive attribute like race or gender. We call this principle equality of opportunity in supervised learning.When implemented, our framework also improves incentives by shifting the cost of poor predictions from the individual to the decision maker, who can respond by investing in improved prediction accu[...]

Graph-powered Machine Learning at Google


Posted by Sujith Ravi, Staff Research Scientist, Google ResearchRecently, there have been significant advances in Machine Learning that enable computer systems to solve complex real-world problems. One of those advances is Google’s large scale, graph-based machine learning platform, built by the Expander team in Google Research. A technology that is behind many of the Google products and features you may use everyday, graph-based machine learning is a powerful tool that can be used to power useful features such as reminders in Inbox and smart messaging in Allo, or used in conjunction with deep neural networks to power the latest image recognition system in Google Photos. Learning with Minimal SupervisionMuch of the recent success in deep learning, and machine learning in general, can be attributed to models that demonstrate high predictive capacity when trained on large amounts of labeled data -- often millions of training examples. This is commonly referred to as “supervised learning” since it requires supervision, in the form of labeled data, to train the machine learning systems. (Conversely, some machine learning methods operate directly on raw data without any supervision, a paradigm referred to as unsupervised learning.)However, the more difficult the task, the harder it is to get sufficient high-quality labeled data. It is often prohibitively labor intensive and time-consuming to collect labeled data for every new problem. This motivated the Expander research team to build new technology for powering machine learning applications at scale and with minimal supervision. Expander’s technology draws inspiration from how humans learn to generalize and bridge the gap between what they already know (labeled information) and novel, unfamiliar observations (unlabeled information). Known as “semi-supervised” learning, this powerful technique enables us to build systems that can work in situations where training data may be sparse. One of the key advantages to a graph-based semi-supervised machine learning approach is the fact that (a) one models labeled and unlabeled data jointly during learning, leveraging the underlying structure in the data, (b) one can easily combine multiple types of signals (for example, relational information from Knowledge Graph along with raw features) into a single graph representation and learn over them. This is in contrast to other machine learning approaches, such as neural network methods, in which it is typical to first train a system using labeled data with features and then apply the trained system to unlabeled data.Graph Learning: How It WorksAt its core, Expander’s platform combines semi-supervised machine learning with large-scale graph-based learning by building a multi-graph representation of the data with nodes corresponding to objects or concepts and edges connecting concepts that share similarities. The graph typically contains both labeled data (nodes associated with a known output category or label) and unlabeled data (nodes for which no labels were provided). Expander’s framework then performs semi-supervised learning to label all nodes jointly by propagating label information across the graph. However, this is easier said than done! We have to (1) learn efficiently at scale with minimal supervision (i.e., tiny amount of labeled data), (2) operate over multi-modal data (i.e., heterogeneous representations and various sources of data), and (3) solve challenging prediction tasks (i.e., large, complex output spaces) involving high dimensional data that might be noisy.One of the primary ingredients in the en[...]

How Robots Can Acquire New Skills from Their Shared Experience


Posted by Sergey Levine (Google Brain Team), Timothy Lillicrap (DeepMind), Mrinal Kalakrishnan (X)The ability to learn from experience will likely be a key in enabling robots to help with complex real-world tasks, from assisting the elderly with chores and daily activities, to helping us in offices and hospitals, to performing jobs that are too dangerous or unpleasant for people. However, if each robot must learn its full repertoire of skills for these tasks only from its own experience, it could take far too long to acquire a rich enough range of behaviors to be useful. Could we bridge this gap by making it possible for robots to collectively learn from each other’s experiences?While machine learning algorithms have made great strides in natural language understanding and speech recognition, the kind of symbolic high-level reasoning that allows people to communicate complex concepts in words remains out of reach for machines. However, robots can instantaneously transmit their experience to other robots over the network - sometimes known as "cloud robotics" - and it is this ability that can let them learn from each other.This is true even for seemingly simple low-level skills. Humans and animals excel at adaptive motor control that integrates their senses, reflexes, and muscles in a closely coordinated feedback loop. Robots still struggle with these basic skills in the real world, where the variability and complexity of the environment demands well-honed behaviors that are not easily fooled by distractors. If we enable robots to transmit their experiences to each other, could they learn to perform motion skills in close coordination with sensing in realistic environments?We previously wrote about how multiple robots could pool their experiences to learn a grasping task. Here, we will discuss new experiments that we conducted to investigate three possible approaches for general-purpose skill learning across multiple robots: learning motion skills directly from experience, learning internal models of physics, and learning skills with human assistance. In all three cases, multiple robots shared their experiences to build a common model of the skill. The skills learned by the robots are still relatively simple -- pushing objects and opening doors -- but by learning such skills more quickly and efficiently through collective learning, robots might in the future acquire richer behavioral repertoires that could eventually make it possible for them to assist us in our daily lives.Learning from raw experience with model-free reinforcement learning.Perhaps one of the simplest ways for robots to teach each other is to pool information about their successes and failures in the world. Humans and animals acquire many skills by direct trial-and-error learning. During this kind of ‘model-free’ learning -- so called because there is no explicit model of the environment formed -- they explore variations on their existing behavior and then reinforce and exploit the variations that give bigger rewards. In combination with deep neural networks, model-free algorithms have recently proved to be surprisingly effective and have been key to successes with the Atari video game system and playing Go. Having multiple robots allows us to experiment with sharing experiences to speed up this kind of direct learning in the real world.In these experiments we tasked robots with trying to move their arms to goal locations, or reaching to and opening a door. Each robot has a copy of a neural network that allows it to estimate the value of taking a given action in a [...]

Introducing the Open Images Dataset


Posted by Ivan Krasin and Tom Duerig, Software EngineersIn the last few years, advances in machine learning have enabled Computer Vision to progress rapidly, allowing for systems that can automatically caption images to apps that can create natural language replies in response to shared photos. Much of this progress can be attributed to publicly available image datasets, such as ImageNet and COCO for supervised learning, and YFCC100M for unsupervised learning.Today, we introduce Open Images, a dataset consisting of ~9 million URLs to images that have been annotated with labels spanning over 6000 categories. We tried to make the dataset as practical as possible: the labels cover more real-life entities than the 1000 ImageNet classes, there are enough images to train a deep neural network from scratch and the images are listed as having a Creative Commons Attribution license*. The image-level annotations have been populated automatically with a vision model similar to Google Cloud Vision API. For the validation set, we had human raters verify these automated labels to find and remove false positives. On average, each image has about 8 labels assigned. Here are some examples:Annotated images form the Open Images dataset. Left: Ghost Arches by Kevin Krejci. Right: Some Silverware by J B. Both images used under CC BY 2.0 licenseWe have trained an Inception v3 model based on Open Images annotations alone, and the model is good enough to be used for fine-tuning applications as well as for other things, like DeepDream or artistic style transfer which require a well developed hierarchy of filters. We hope to improve the quality of the annotations in Open Images the coming months, and therefore the quality of models which can be trained.The dataset is a product of a collaboration between Google, CMU and Cornell universities, and there are a number of research papers built on top of the Open Images dataset in the works. It is our hope that datasets like Open Images and the recently released YouTube-8M will be useful tools for the machine learning community.* While we tried to identify images that are licensed under a Creative Commons Attribution license, we make no representations or warranties regarding the license status of each image and you should verify the license for each image yourself.↩ [...]

Image Compression with Neural Networks


Posted by Nick Johnston and David Minnen, Software EngineersData compression is used nearly everywhere on the internet - the videos you watch online, the images you share, the music you listen to, even the blog you're reading right now. Compression techniques make sharing the content you want quick and efficient. Without data compression, the time and bandwidth costs for getting the information you need, when you need it, would be exorbitant!In "Full Resolution Image Compression with Recurrent Neural Networks", we expand on our previous research on data compression using neural networks, exploring whether machine learning can provide better results for image compression like it has for image recognition and text summarization. Furthermore, we are releasing our compression model via TensorFlow so you can experiment with compressing your own images with our network. We introduce an architecture that uses a new variant of the Gated Recurrent Unit (a type of RNN that allows units to save activations and process sequences) called Residual Gated Recurrent Unit (Residual GRU). Our Residual GRU combines existing GRUs with the residual connections introduced in "Deep Residual Learning for Image Recognition" to achieve significant image quality gains for a given compression rate. Instead of using a DCT to generate a new bit representation like many compression schemes in use today, we train two sets of neural networks - one to create the codes from the image (encoder) and another to create the image from the codes (decoder). Our system works by iteratively refining a reconstruction of the original image, with both the encoder and decoder using Residual GRU layers so that additional information can pass from one iteration to the next. Each iteration adds more bits to the encoding, which allows for a higher quality reconstruction. Conceptually, the network operates as follows:The initial residual, R[0], corresponds to the original image I: R[0] = I.Set i=1 for to the first iteration.Iteration[i] takes R[i-1] as input and runs the encoder and binarizer to compress the image into B[i].Iteration[i] runs the decoder on B[i] to generate a reconstructed image P[i].The residual for Iteration[i] is calculated: R[i] = I - P[i].Set i=i+1 and go to Step 3 (up to the desired number of iterations).The residual image represents how different the current version of the compressed image is from the original. This image is then given as input to the network with the goal of removing the compression errors from the next version of the compressed image. The compressed image is now represented by the concatenation of B[1] through B[N]. For larger values of N, the decoder gets more information on how to reduce the errors and generate a higher quality reconstruction of the original image.To understand how this works, consider the following example of the first two iterations of the image compression network, shown in the figures below. We start with an image of a lighthouse. On the first pass through the network, the original image is given as an input (R[0] = I). P[1] is the reconstructed image. The difference between the original image and encoded image is the residual, R[1], which represents the error in the compression. Left: Original image, I = R[0]. Center: Reconstructed image, P[1]. Right: the residual, R[1], which represents the error introduced by compression.On the second pass through the network, R[1] is given as the network’s input (see figure below). A higher quality image P[2] is then created. So ho[...]

Announcing YouTube-8M: A Large and Diverse Labeled Video Dataset for Video Understanding Research


Posted by Sudheendra Vijayanarasimhan and Paul Natsev, Software EngineersMany recent breakthroughs in machine learning and machine perception have come from the availability of large labeled datasets, such as ImageNet, which has millions of images labeled with thousands of classes. Their availability has significantly accelerated research in image understanding, for example on detecting and classifying objects in static images.Video analysis provides even more information for detecting and recognizing objects, and understanding human actions and interactions with the world. Improving video understanding can lead to better video search and discovery, similarly to how image understanding helped re-imagine the photos experience. However, one of the key bottlenecks for further advancements in this area has been the lack of real-world video datasets with the same scale and diversity as image datasets. Today, we are excited to announce the release of YouTube-8M, a dataset of 8 million YouTube video URLs (representing over 500,000 hours of video), along with video-level labels from a diverse set of 4800 Knowledge Graph entities. This represents a significant increase in scale and diversity compared to existing video datasets. For example, Sports-1M, the largest existing labeled video dataset we are aware of, has around 1 million YouTube videos and 500 sports-specific classes--YouTube-8M represents nearly an order of magnitude increase in both number of videos and classes.In order to construct a labeled video dataset of this scale, we needed to address two key challenges: (1) video is much more time-consuming to annotate manually than images, and (2) video is very computationally expensive to process and store. To overcome (1), we turned to YouTube and its video annotation system, which identifies relevant Knowledge Graph topics for all public YouTube videos. While these annotations are machine-generated, they incorporate powerful user engagement signals from millions of users as well as video metadata and content analysis. As a result, the quality of these annotations is sufficiently high to be useful for video understanding research and benchmarking purposes. height="400px" src="" width="100%"> To ensure the stability and quality of the labeled video dataset, we used only public videos with more than 1000 views, and we constructed a diverse vocabulary of entities, which are visually observable and sufficiently frequent. The vocabulary construction was a combination of frequency analysis, automated filtering, verification by human raters that the entities are visually observable, and grouping into 24 top-level verticals (more details in our technical report). The figures below depict the dataset browser and the distribution of videos along the top-level verticals, and illustrate the dataset’s scale and diversity.A dataset explorer allows browsing and searching the full vocabulary of Knowledge Graph entities, grouped in 24 top-level verticals, along with corresponding videos. This screenshot depicts a subset of dataset videos annotated with the entity “Guitar”.The distribution of videos in the top-level verticals illustrates the scope and diversity of the dataset and reflects the natural distribution of popular YouTube videos.To address (2), we had to overcome the storage and computational resource bottlenecks that researchers face when working with videos. Pursuing video understanding at YouTube-8M’s sc[...]

A Neural Network for Machine Translation, at Production Scale


Posted by Quoc V. Le & Mike Schuster, Research Scientists, Google Brain TeamTen years ago, we announced the launch of Google Translate, together with the use of Phrase-Based Machine Translation as the key algorithm behind this service. Since then, rapid advances in machine intelligence have improved our speech recognition and image recognition capabilities, but improving machine translation remains a challenging goal.Today we announce the Google Neural Machine Translation system (GNMT), which utilizes state-of-the-art training techniques to achieve the largest improvements to date for machine translation quality. Our full research results are described in a new technical report we are releasing today: “Google’s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation” [1]. A few years ago we started using Recurrent Neural Networks (RNNs) to directly learn the mapping between an input sequence (e.g. a sentence in one language) to an output sequence (that same sentence in another language) [2]. Whereas Phrase-Based Machine Translation (PBMT) breaks an input sentence into words and phrases to be translated largely independently, Neural Machine Translation (NMT) considers the entire input sentence as a unit for translation.The advantage of this approach is that it requires fewer engineering design choices than previous Phrase-Based translation systems. When it first came out, NMT showed equivalent accuracy with existing Phrase-Based translation systems on modest-sized public benchmark data sets.Since then, researchers have proposed many techniques to improve NMT, including work on handling rare words by mimicking an external alignment model [3], using attention to align input words and output words [4] and breaking words into smaller units to cope with rare words [5,6]. Despite these improvements, NMT wasn't fast or accurate enough to be used in a production system, such as Google Translate. Our new paper [1] describes how we overcame the many challenges to make NMT work on very large data sets and built a system that is sufficiently fast and accurate enough to provide better translations for Google’s users and services.Data from side-by-side evaluations, where human raters compare the quality of translations for a given source sentence. Scores range from 0 to 6, with 0 meaning “completely nonsense translation”, and 6 meaning “perfect translation."The following visualization shows the progression of GNMT as it translates a Chinese sentence to English. First, the network encodes the Chinese words as a list of vectors, where each vector represents the meaning of all words read so far (“Encoder”). Once the entire sentence is read, the decoder begins, generating the English sentence one word at a time (“Decoder”). To generate the translated word at each step, the decoder pays attention to a weighted distribution over the encoded Chinese vectors most relevant to generate the English word (“Attention”; the blue link transparency represents how much the decoder pays attention to an encoded word).Using human-rated side-by-side comparison as a metric, the GNMT system produces translations that are vastly improved compared to the previous phrase-based production system. GNMT reduces translation errors by more than 55%-85% on several major language pairs measured on sampled sentences from Wikipedia and news websites with the help of bilingual human raters.An example of a translation p[...]

Show and Tell: image captioning open sourced in TensorFlow


Posted by Chris Shallue, Software Engineer, Google Brain TeamIn 2014, research scientists on the Google Brain team trained a machine learning system to automatically produce captions that accurately describe images. Further development of that system led to its success in the Microsoft COCO 2015 image captioning challenge, a competition to compare the best algorithms for computing accurate image captions, where it tied for first place.Today, we’re making the latest version of our image captioning system available as an open source model in TensorFlow. This release contains significant improvements to the computer vision component of the captioning system, is much faster to train, and produces more detailed and accurate descriptions compared to the original system. These improvements are outlined and analyzed in the paper Show and Tell: Lessons learned from the 2015 MSCOCO Image Captioning Challenge, published in IEEE Transactions on Pattern Analysis and Machine Intelligence.Automatically captioned by our system.So what’s new?Our 2014 system used the Inception V1 image classification model to initialize the image encoder, which produces the encodings that are useful for recognizing different objects in the images. This was the best image model available at the time, achieving 89.6% top-5 accuracy on the benchmark ImageNet 2012 image classification task. We replaced this in 2015 with the newer Inception V2 image classification model, which achieves 91.8% accuracy on the same task. The improved vision component gave our captioning system an accuracy boost of 2 points in the BLEU-4 metric (which is commonly used in machine translation to evaluate the quality of generated sentences) and was an important factor of its success in the captioning challenge.Today’s code release initializes the image encoder using the Inception V3 model, which achieves 93.9% accuracy on the ImageNet classification task. Initializing the image encoder with a better vision model gives the image captioning system a better ability to recognize different objects in the images, allowing it to generate more detailed and accurate descriptions. This gives an additional 2 points of improvement in the BLEU-4 metric over the system used in the captioning challenge.Another key improvement to the vision component comes from fine-tuning the image model. This step addresses the problem that the image encoder is initialized by a model trained to classify objects in images, whereas the goal of the captioning system is to describe the objects in images using the encodings produced by the image model. For example, an image classification model will tell you that a dog, grass and a frisbee are in the image, but a natural description should also tell you the color of the grass and how the dog relates to the frisbee.In the fine-tuning phase, the captioning system is improved by jointly training its vision and language components on human generated captions. This allows the captioning system to transfer information from the image that is specifically useful for generating descriptive captions, but which was not necessary for classifying objects. In particular, after fine-tuning it becomes better at correctly describing the colors of objects. Importantly, the fine-tuning phase must occur after the language component has already learned to generate captions - otherwise, the noisiness of the randomly initialized language component causes irreversi[...]

The 280-Year-Old Algorithm Inside Google Trips


Posted by Bogdan Arsintescu, Software Engineer & Sreenivas Gollapudi, Kostas Kollias, Tamas Sarlos and Andrew Tomkins, Research ScientistsAlgorithms Engineering is a lot of fun because algorithms do not go out of fashion: one never knows when an oldie-but-goodie might come in handy. Case in point: Yesterday, Google announced Google Trips, a new app to assist you in your travels by helping you create your own “perfect day” in a city. Surprisingly, deep inside Google Trips, there is an algorithm that was invented 280 years ago. In 1736, Leonhard Euler authored a brief but beautiful mathematical paper regarding the town of Königsberg and its 7 bridges, shown here:Image from WikipediaIn the paper, Euler studied the following question: is it possible to walk through the city crossing each bridge exactly once? As it turns out, for the city of Königsberg, the answer is no. To reach this answer, Euler developed a general approach to represent any layout of landmasses and bridges in terms of what he dubbed the Geometriam Situs (the “Geometry of Place”), which we now call Graph Theory. He represented each landmass as a “node” in the graph, and each bridge as an “edge,” like this:Image from WikipediaEuler noticed that if all the nodes in the graph have an even number of edges (such graphs are called “Eulerian” in his honor) then, and only then, a cycle can be found that visits every edge exactly once. Keep this in mind, as we’ll rely on this fact later in the post.Our team in Google Research has been fascinated by the “Geometry of Place” for some time, and we started investigating a question related to Euler’s: rather than visiting just the bridges, how can we visit as many interesting places as possible during a particular trip? We call this the “itineraries” problem. Euler didn’t study it, but it is a well known topic in Optimization, where it is often called the “Orienteering” problem.While Euler’s problem has an efficient and exact solution, the itineraries problem is not just hard to solve, it is hard to even approximately solve! The difficulty lies in the interplay between two conflicting goals: first, we should pick great places to visit, but second, we should pick them to allow a good itinerary: not too much travel time; don’t visit places when they’re closed; don’t visit too many museums, etc. Embedded in such problems is the challenge of finding efficient routes, often referred to as the Travelling Salesman Problem (TSP).Algorithms for Travel ItinerariesFortunately, the real world has a property called the “triangle inequality” that says adding an extra stop to a route never makes it shorter. When the underlying geometry satisfies the triangle inequality, the TSP can be approximately solved using another algorithm discovered by Christofides in 1976. This is an important part of our solution, and builds on Euler’s paper, so we’ll give a quick four-step rundown of how it works here:We start with all our destinations separate, and repeatedly connect together the closest two that aren’t yet connected. This doesn’t yet give us an itinerary, but it does connect all the destinations via a minimum spanning tree of the graph.We take all the destinations that have an odd number of connections in this tree (Euler proved there must be an even number of these), and carefully pair them up.Because all the destinatio[...]