Subscribe: Google Research Blog
Added By: Feedage Forager Feedage Grade A rated
Language: English
data  google  inception  learning  machine learning  machine  model  models  network  neural  new  research  system  university 
Rate this Feed
Rate this feedRate this feedRate this feedRate this feedRate this feed
Rate this feed 1 starRate this feed 2 starRate this feed 3 starRate this feed 4 starRate this feed 5 star

Comments (0)

Feed Details and Statistics Feed Statistics
Preview: Google Research Blog

Google Research Blog

The latest news on Google Research.

Updated: 2016-10-25T15:06:17.125-07:00


Course Builder now supports scheduling, easier customization and more


Posted by Adam Feldman, Product Manager and Pavel Simakov, Technical Lead, Course Builder TeamOver the years, we've learned that there are as many ways to run an online course as there are instructors to run them. Today's release of Course Builder v1.11 has a focus on improved student access controls, easier visual customization and a new course explorer. Additionally, we've added better support for deploying from Windows!Improved student access controlsA course's availability is often dynamic - sometimes you want to make a course available to everyone all at once, while other times may call for the course to be available to some students before others. Perhaps registration will be available for a while and then the course later becomes read-only. To support these use cases, we've added Student Groups and Calendar Triggers.Student Groups allow you to define which students can see which parts of a course. Want your morning class to see unit 5 and your afternoon class to see unit 6 -- while letting random Internet visitors only see unit 1? Student groups have you covered.Calendar Triggers can be used to update course or content availability automatically at a specific time. For instance, if your course goes live at midnight on Sunday night, you don't need to be at a computer to make it happen. Or, if you want to unlock a new unit every week, you can set up a trigger to automate the process. Read more about calendar triggers and availability.You can even use these features together. Say you want to start a new group of students through the course every month, giving each access to one new unit per week. Using Student Groups and Calendar Triggers together, you can achieve this cohort-like functionality.Easier visual customizationIn the past, if you wanted to customize Course Builder's student experience beyond a certain point, you needed to be a Python developer. We heard from many web developers that they would like to be able to create their own student-facing pages, too. With this release, Course Builder includes a GraphQL server that allows you to create your own frontend experience, while still letting Course Builder take care of things like user sessions and statefulness.New course explorerLarge Course Builder partners such as Google's Digital Workshop and NPTEL have many courses and students with diverse needs. To help them, we've completely revamped the Course Explorer page, giving it richer information and interactivity, so your students can find which of your courses they're looking for. You can provide categories and start/end dates, in addition to the course title, abstract and instructor information.In v1.11, we've added several new highly requested features. Together, they help make Course Builder easier to use and customize, giving you the flexibility to schedule things in advance.We've come a long way since releasing our first experimental code over 4 years ago, turning Course Builder into a large open-source Google App Engine application with over 5 million student registrations across all Course Builder users. With these latest additions, we consider Course Builder feature complete and fully capable of delivering online learning at any scale. We will continue to provide support and bug fixes for those using the platform.We hope you’ll enjoy these new features and share how you’re using them in the forum. Keep on learning! [...]

Equality of Opportunity in Machine Learning


Posted by Moritz Hardt, Research Scientist, Google Brain TeamAs machine learning technology progresses rapidly, there is much interest in understanding its societal impact. A particularly successful branch of machine learning is supervised learning. With enough past data and computational resources, learning algorithms often produce surprisingly effective predictors of future events. To take one hypothetical example: an algorithm could, for example, be used to predict with high accuracy who will pay back their loan. Lenders might then use such a predictor as an aid in deciding who should receive a loan in the first place. Decisions based on machine learning can be both incredibly useful and have a profound impact on our lives.Even the best predictors make mistakes. Although machine learning aims to minimize the chance of a mistake, how do we prevent certain groups from experiencing a disproportionate share of these mistakes? Consider the case of a group that we have relatively little data on and whose characteristics differ from those of the general population in ways that are relevant to the prediction task. As prediction accuracy is generally correlated with the amount of data available for training, it is likely that incorrect predictions will be more common in this group. A predictor might, for example, end up flagging too many individuals in this group as ‘high risk of default’ even though they pay back their loan. When group membership coincides with a sensitive attribute, such as race, gender, disability, or religion, this situation can lead to unjust or prejudicial outcomes.Despite the need, a vetted methodology in machine learning for preventing this kind of discrimination based on sensitive attributes has been lacking. A naive approach might require a set of sensitive attributes to be removed from the data before doing anything else with it. This idea of “fairness through unawareness,” however, fails due to the existence of “redundant encodings.” Even if a particular attribute is not present in the data, combinations of other attributes can act as a proxy.Another common approach, called demographic parity, asks that the prediction must be uncorrelated with the sensitive attribute. This might sound intuitively desirable, but the outcome itself is often correlated with the sensitive attribute. For example, the incidence of heart failure is substantially more common in men than in women. When predicting such a medical condition, it is therefore neither realistic nor desirable to prevent all correlation between the predicted outcome and group membership.Equal Opportunity Taking these conceptual difficulties into account, we’ve proposed a methodology for measuring and preventing discrimination based on a set of sensitive attributes. Our framework not only helps to scrutinize predictors to discover possible concerns. We also show how to adjust a given predictor so as to strike a better tradeoff between classification accuracy and non-discrimination if need be.At the heart of our approach is the idea that individuals who qualify for a desirable outcome should have an equal chance of being correctly classified for this outcome. In our fictional loan example, it means the rate of ‘low risk’ predictions among people who actually pay back their loan should not depend on a sensitive attribute like race or gender. We call this principle equality of opportunity in supervised learning.When implemented, our framework also improves incentives by shifting the cost of poor predictions from the individual to the decision maker, who can respond by investing in improved prediction accuracy. Perfect predictors always satisfy our notion, showing that the central goal of building more accurate predictors is well aligned with the goal of avoiding discrimination.Learn moreTo explore the ideas in this blog post on your own, our Big Picture team created a beautiful interactive visualization of the different concepts and tradeoffs. So, head on over to their page to learn more. Once you’ve wal[...]

Graph-powered Machine Learning at Google


Posted by Sujith Ravi, Staff Research Scientist, Google ResearchRecently, there have been significant advances in Machine Learning that enable computer systems to solve complex real-world problems. One of those advances is Google’s large scale, graph-based machine learning platform, built by the Expander team in Google Research. A technology that is behind many of the Google products and features you may use everyday, graph-based machine learning is a powerful tool that can be used to power useful features such as reminders in Inbox and smart messaging in Allo, or used in conjunction with deep neural networks to power the latest image recognition system in Google Photos. Learning with Minimal SupervisionMuch of the recent success in deep learning, and machine learning in general, can be attributed to models that demonstrate high predictive capacity when trained on large amounts of labeled data -- often millions of training examples. This is commonly referred to as “supervised learning” since it requires supervision, in the form of labeled data, to train the machine learning systems. (Conversely, some machine learning methods operate directly on raw data without any supervision, a paradigm referred to as unsupervised learning.)However, the more difficult the task, the harder it is to get sufficient high-quality labeled data. It is often prohibitively labor intensive and time-consuming to collect labeled data for every new problem. This motivated the Expander research team to build new technology for powering machine learning applications at scale and with minimal supervision. Expander’s technology draws inspiration from how humans learn to generalize and bridge the gap between what they already know (labeled information) and novel, unfamiliar observations (unlabeled information). Known as “semi-supervised” learning, this powerful technique enables us to build systems that can work in situations where training data may be sparse. One of the key advantages to a graph-based semi-supervised machine learning approach is the fact that (a) one models labeled and unlabeled data jointly during learning, leveraging the underlying structure in the data, (b) one can easily combine multiple types of signals (for example, relational information from Knowledge Graph along with raw features) into a single graph representation and learn over them. This is in contrast to other machine learning approaches, such as neural network methods, in which it is typical to first train a system using labeled data with features and then apply the trained system to unlabeled data.Graph Learning: How It WorksAt its core, Expander’s platform combines semi-supervised machine learning with large-scale graph-based learning by building a multi-graph representation of the data with nodes corresponding to objects or concepts and edges connecting concepts that share similarities. The graph typically contains both labeled data (nodes associated with a known output category or label) and unlabeled data (nodes for which no labels were provided). Expander’s framework then performs semi-supervised learning to label all nodes jointly by propagating label information across the graph. However, this is easier said than done! We have to (1) learn efficiently at scale with minimal supervision (i.e., tiny amount of labeled data), (2) operate over multi-modal data (i.e., heterogeneous representations and various sources of data), and (3) solve challenging prediction tasks (i.e., large, complex output spaces) involving high dimensional data that might be noisy.One of the primary ingredients in the entire learning process is the graph and choice of connections. Graphs come in all sizes, shapes and can be combined from multiple sources. We have observed that it is often beneficial to learn over multi-graphs that combine information from multiple types of data representations (e.g., image pixels, object categories and chat response messages for PhotoReply in Allo). The Expander team’s graph learning pl[...]

How Robots Can Acquire New Skills from Their Shared Experience


Posted by Sergey Levine (Google Brain Team), Timothy Lillicrap (DeepMind), Mrinal Kalakrishnan (X)The ability to learn from experience will likely be a key in enabling robots to help with complex real-world tasks, from assisting the elderly with chores and daily activities, to helping us in offices and hospitals, to performing jobs that are too dangerous or unpleasant for people. However, if each robot must learn its full repertoire of skills for these tasks only from its own experience, it could take far too long to acquire a rich enough range of behaviors to be useful. Could we bridge this gap by making it possible for robots to collectively learn from each other’s experiences?While machine learning algorithms have made great strides in natural language understanding and speech recognition, the kind of symbolic high-level reasoning that allows people to communicate complex concepts in words remains out of reach for machines. However, robots can instantaneously transmit their experience to other robots over the network - sometimes known as "cloud robotics" - and it is this ability that can let them learn from each other.This is true even for seemingly simple low-level skills. Humans and animals excel at adaptive motor control that integrates their senses, reflexes, and muscles in a closely coordinated feedback loop. Robots still struggle with these basic skills in the real world, where the variability and complexity of the environment demands well-honed behaviors that are not easily fooled by distractors. If we enable robots to transmit their experiences to each other, could they learn to perform motion skills in close coordination with sensing in realistic environments?We previously wrote about how multiple robots could pool their experiences to learn a grasping task. Here, we will discuss new experiments that we conducted to investigate three possible approaches for general-purpose skill learning across multiple robots: learning motion skills directly from experience, learning internal models of physics, and learning skills with human assistance. In all three cases, multiple robots shared their experiences to build a common model of the skill. The skills learned by the robots are still relatively simple -- pushing objects and opening doors -- but by learning such skills more quickly and efficiently through collective learning, robots might in the future acquire richer behavioral repertoires that could eventually make it possible for them to assist us in our daily lives.Learning from raw experience with model-free reinforcement learning.Perhaps one of the simplest ways for robots to teach each other is to pool information about their successes and failures in the world. Humans and animals acquire many skills by direct trial-and-error learning. During this kind of ‘model-free’ learning -- so called because there is no explicit model of the environment formed -- they explore variations on their existing behavior and then reinforce and exploit the variations that give bigger rewards. In combination with deep neural networks, model-free algorithms have recently proved to be surprisingly effective and have been key to successes with the Atari video game system and playing Go. Having multiple robots allows us to experiment with sharing experiences to speed up this kind of direct learning in the real world.In these experiments we tasked robots with trying to move their arms to goal locations, or reaching to and opening a door. Each robot has a copy of a neural network that allows it to estimate the value of taking a given action in a given state. By querying this network, the robot can quickly decide what actions might be worth taking in the world. When a robot acts, we add noise to the actions it selects, so the resulting behavior is sometimes a bit better than previously observed, and sometimes a bit worse. This allows each robot to explore different ways of approaching a task. Records of the actions taken by the robots, their beh[...]

Introducing the Open Images Dataset


Posted by Ivan Krasin and Tom Duerig, Software EngineersIn the last few years, advances in machine learning have enabled Computer Vision to progress rapidly, allowing for systems that can automatically caption images to apps that can create natural language replies in response to shared photos. Much of this progress can be attributed to publicly available image datasets, such as ImageNet and COCO for supervised learning, and YFCC100M for unsupervised learning.Today, we introduce Open Images, a dataset consisting of ~9 million URLs to images that have been annotated with labels spanning over 6000 categories. We tried to make the dataset as practical as possible: the labels cover more real-life entities than the 1000 ImageNet classes, there are enough images to train a deep neural network from scratch and the images are listed as having a Creative Commons Attribution license*. The image-level annotations have been populated automatically with a vision model similar to Google Cloud Vision API. For the validation set, we had human raters verify these automated labels to find and remove false positives. On average, each image has about 8 labels assigned. Here are some examples:Annotated images form the Open Images dataset. Left: Ghost Arches by Kevin Krejci. Right: Some Silverware by J B. Both images used under CC BY 2.0 licenseWe have trained an Inception v3 model based on Open Images annotations alone, and the model is good enough to be used for fine-tuning applications as well as for other things, like DeepDream or artistic style transfer which require a well developed hierarchy of filters. We hope to improve the quality of the annotations in Open Images the coming months, and therefore the quality of models which can be trained.The dataset is a product of a collaboration between Google, CMU and Cornell universities, and there are a number of research papers built on top of the Open Images dataset in the works. It is our hope that datasets like Open Images and the recently released YouTube-8M will be useful tools for the machine learning community.* While we tried to identify images that are licensed under a Creative Commons Attribution license, we make no representations or warranties regarding the license status of each image and you should verify the license for each image yourself.↩ [...]

Image Compression with Neural Networks


Posted by Nick Johnston and David Minnen, Software EngineersData compression is used nearly everywhere on the internet - the videos you watch online, the images you share, the music you listen to, even the blog you're reading right now. Compression techniques make sharing the content you want quick and efficient. Without data compression, the time and bandwidth costs for getting the information you need, when you need it, would be exorbitant!In "Full Resolution Image Compression with Recurrent Neural Networks", we expand on our previous research on data compression using neural networks, exploring whether machine learning can provide better results for image compression like it has for image recognition and text summarization. Furthermore, we are releasing our compression model via TensorFlow so you can experiment with compressing your own images with our network. We introduce an architecture that uses a new variant of the Gated Recurrent Unit (a type of RNN that allows units to save activations and process sequences) called Residual Gated Recurrent Unit (Residual GRU). Our Residual GRU combines existing GRUs with the residual connections introduced in "Deep Residual Learning for Image Recognition" to achieve significant image quality gains for a given compression rate. Instead of using a DCT to generate a new bit representation like many compression schemes in use today, we train two sets of neural networks - one to create the codes from the image (encoder) and another to create the image from the codes (decoder). Our system works by iteratively refining a reconstruction of the original image, with both the encoder and decoder using Residual GRU layers so that additional information can pass from one iteration to the next. Each iteration adds more bits to the encoding, which allows for a higher quality reconstruction. Conceptually, the network operates as follows:The initial residual, R[0], corresponds to the original image I: R[0] = I.Set i=1 for to the first iteration.Iteration[i] takes R[i-1] as input and runs the encoder and binarizer to compress the image into B[i].Iteration[i] runs the decoder on B[i] to generate a reconstructed image P[i].The residual for Iteration[i] is calculated: R[i] = I - P[i].Set i=i+1 and go to Step 3 (up to the desired number of iterations).The residual image represents how different the current version of the compressed image is from the original. This image is then given as input to the network with the goal of removing the compression errors from the next version of the compressed image. The compressed image is now represented by the concatenation of B[1] through B[N]. For larger values of N, the decoder gets more information on how to reduce the errors and generate a higher quality reconstruction of the original image.To understand how this works, consider the following example of the first two iterations of the image compression network, shown in the figures below. We start with an image of a lighthouse. On the first pass through the network, the original image is given as an input (R[0] = I). P[1] is the reconstructed image. The difference between the original image and encoded image is the residual, R[1], which represents the error in the compression. Left: Original image, I = R[0]. Center: Reconstructed image, P[1]. Right: the residual, R[1], which represents the error introduced by compression.On the second pass through the network, R[1] is given as the network’s input (see figure below). A higher quality image P[2] is then created. So how does the system recreate such a good image (P[2], center panel below) from the residual R[1]? Because the model uses recurrent nodes with memory, the network saves information from each iteration that it can use in the next one. It learned something about the original image in Iteration[1] that is used along with R[1] to generate a better P[2] from B[2]. Lastly, a new residual, R[2] (right), is generated[...]

Announcing YouTube-8M: A Large and Diverse Labeled Video Dataset for Video Understanding Research


Posted by Sudheendra Vijayanarasimhan and Paul Natsev, Software EngineersMany recent breakthroughs in machine learning and machine perception have come from the availability of large labeled datasets, such as ImageNet, which has millions of images labeled with thousands of classes. Their availability has significantly accelerated research in image understanding, for example on detecting and classifying objects in static images.Video analysis provides even more information for detecting and recognizing objects, and understanding human actions and interactions with the world. Improving video understanding can lead to better video search and discovery, similarly to how image understanding helped re-imagine the photos experience. However, one of the key bottlenecks for further advancements in this area has been the lack of real-world video datasets with the same scale and diversity as image datasets. Today, we are excited to announce the release of YouTube-8M, a dataset of 8 million YouTube video URLs (representing over 500,000 hours of video), along with video-level labels from a diverse set of 4800 Knowledge Graph entities. This represents a significant increase in scale and diversity compared to existing video datasets. For example, Sports-1M, the largest existing labeled video dataset we are aware of, has around 1 million YouTube videos and 500 sports-specific classes--YouTube-8M represents nearly an order of magnitude increase in both number of videos and classes.In order to construct a labeled video dataset of this scale, we needed to address two key challenges: (1) video is much more time-consuming to annotate manually than images, and (2) video is very computationally expensive to process and store. To overcome (1), we turned to YouTube and its video annotation system, which identifies relevant Knowledge Graph topics for all public YouTube videos. While these annotations are machine-generated, they incorporate powerful user engagement signals from millions of users as well as video metadata and content analysis. As a result, the quality of these annotations is sufficiently high to be useful for video understanding research and benchmarking purposes. height="400px" src="" width="100%"> To ensure the stability and quality of the labeled video dataset, we used only public videos with more than 1000 views, and we constructed a diverse vocabulary of entities, which are visually observable and sufficiently frequent. The vocabulary construction was a combination of frequency analysis, automated filtering, verification by human raters that the entities are visually observable, and grouping into 24 top-level verticals (more details in our technical report). The figures below depict the dataset browser and the distribution of videos along the top-level verticals, and illustrate the dataset’s scale and diversity.A dataset explorer allows browsing and searching the full vocabulary of Knowledge Graph entities, grouped in 24 top-level verticals, along with corresponding videos. This screenshot depicts a subset of dataset videos annotated with the entity “Guitar”.The distribution of videos in the top-level verticals illustrates the scope and diversity of the dataset and reflects the natural distribution of popular YouTube videos.To address (2), we had to overcome the storage and computational resource bottlenecks that researchers face when working with videos. Pursuing video understanding at YouTube-8M’s scale would normally require a petabyte of video storage and dozens of CPU-years worth of processing. To make the dataset useful to researchers and students with limited computational resources, we pre-processed the videos and extracted frame-level features using a state-of-the-art deep learning model--the publicly available Inception-V3 image annotation model trained on ImageNet. These features are extracte[...]

A Neural Network for Machine Translation, at Production Scale


Posted by Quoc V. Le & Mike Schuster, Research Scientists, Google Brain TeamTen years ago, we announced the launch of Google Translate, together with the use of Phrase-Based Machine Translation as the key algorithm behind this service. Since then, rapid advances in machine intelligence have improved our speech recognition and image recognition capabilities, but improving machine translation remains a challenging goal.Today we announce the Google Neural Machine Translation system (GNMT), which utilizes state-of-the-art training techniques to achieve the largest improvements to date for machine translation quality. Our full research results are described in a new technical report we are releasing today: “Google’s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation” [1]. A few years ago we started using Recurrent Neural Networks (RNNs) to directly learn the mapping between an input sequence (e.g. a sentence in one language) to an output sequence (that same sentence in another language) [2]. Whereas Phrase-Based Machine Translation (PBMT) breaks an input sentence into words and phrases to be translated largely independently, Neural Machine Translation (NMT) considers the entire input sentence as a unit for translation.The advantage of this approach is that it requires fewer engineering design choices than previous Phrase-Based translation systems. When it first came out, NMT showed equivalent accuracy with existing Phrase-Based translation systems on modest-sized public benchmark data sets.Since then, researchers have proposed many techniques to improve NMT, including work on handling rare words by mimicking an external alignment model [3], using attention to align input words and output words [4] and breaking words into smaller units to cope with rare words [5,6]. Despite these improvements, NMT wasn't fast or accurate enough to be used in a production system, such as Google Translate. Our new paper [1] describes how we overcame the many challenges to make NMT work on very large data sets and built a system that is sufficiently fast and accurate enough to provide better translations for Google’s users and services.Data from side-by-side evaluations, where human raters compare the quality of translations for a given source sentence. Scores range from 0 to 6, with 0 meaning “completely nonsense translation”, and 6 meaning “perfect translation."The following visualization shows the progression of GNMT as it translates a Chinese sentence to English. First, the network encodes the Chinese words as a list of vectors, where each vector represents the meaning of all words read so far (“Encoder”). Once the entire sentence is read, the decoder begins, generating the English sentence one word at a time (“Decoder”). To generate the translated word at each step, the decoder pays attention to a weighted distribution over the encoded Chinese vectors most relevant to generate the English word (“Attention”; the blue link transparency represents how much the decoder pays attention to an encoded word).Using human-rated side-by-side comparison as a metric, the GNMT system produces translations that are vastly improved compared to the previous phrase-based production system. GNMT reduces translation errors by more than 55%-85% on several major language pairs measured on sampled sentences from Wikipedia and news websites with the help of bilingual human raters.An example of a translation produced by our system for an input sentence sampled from a news site. Go here for more examples of translations for input sentences sampled randomly from news sites and books.In addition to releasing this research paper today, we are announcing the launch of GNMT in production on a notoriously difficult language pair: Chinese to English. The Google Translate mobile and web apps are now using GNMT for 100% [...]

Show and Tell: image captioning open sourced in TensorFlow


Posted by Chris Shallue, Software Engineer, Google Brain TeamIn 2014, research scientists on the Google Brain team trained a machine learning system to automatically produce captions that accurately describe images. Further development of that system led to its success in the Microsoft COCO 2015 image captioning challenge, a competition to compare the best algorithms for computing accurate image captions, where it tied for first place.Today, we’re making the latest version of our image captioning system available as an open source model in TensorFlow. This release contains significant improvements to the computer vision component of the captioning system, is much faster to train, and produces more detailed and accurate descriptions compared to the original system. These improvements are outlined and analyzed in the paper Show and Tell: Lessons learned from the 2015 MSCOCO Image Captioning Challenge, published in IEEE Transactions on Pattern Analysis and Machine Intelligence.Automatically captioned by our system.So what’s new?Our 2014 system used the Inception V1 image classification model to initialize the image encoder, which produces the encodings that are useful for recognizing different objects in the images. This was the best image model available at the time, achieving 89.6% top-5 accuracy on the benchmark ImageNet 2012 image classification task. We replaced this in 2015 with the newer Inception V2 image classification model, which achieves 91.8% accuracy on the same task. The improved vision component gave our captioning system an accuracy boost of 2 points in the BLEU-4 metric (which is commonly used in machine translation to evaluate the quality of generated sentences) and was an important factor of its success in the captioning challenge.Today’s code release initializes the image encoder using the Inception V3 model, which achieves 93.9% accuracy on the ImageNet classification task. Initializing the image encoder with a better vision model gives the image captioning system a better ability to recognize different objects in the images, allowing it to generate more detailed and accurate descriptions. This gives an additional 2 points of improvement in the BLEU-4 metric over the system used in the captioning challenge.Another key improvement to the vision component comes from fine-tuning the image model. This step addresses the problem that the image encoder is initialized by a model trained to classify objects in images, whereas the goal of the captioning system is to describe the objects in images using the encodings produced by the image model. For example, an image classification model will tell you that a dog, grass and a frisbee are in the image, but a natural description should also tell you the color of the grass and how the dog relates to the frisbee.In the fine-tuning phase, the captioning system is improved by jointly training its vision and language components on human generated captions. This allows the captioning system to transfer information from the image that is specifically useful for generating descriptive captions, but which was not necessary for classifying objects. In particular, after fine-tuning it becomes better at correctly describing the colors of objects. Importantly, the fine-tuning phase must occur after the language component has already learned to generate captions - otherwise, the noisiness of the randomly initialized language component causes irreversible corruption to the vision component. For more details, read the full paper here.Left: the better image model allows the captioning model to generate more detailed and accurate descriptions. Right: after fine-tuning the image model, the image captioning system is more likely to describe the colors of objects correctly.Until recently our image captioning system was implemented in the DistBelief software fr[...]

The 280-Year-Old Algorithm Inside Google Trips


Posted by Bogdan Arsintescu, Software Engineer & Sreenivas Gollapudi, Kostas Kollias, Tamas Sarlos and Andrew Tomkins, Research ScientistsAlgorithms Engineering is a lot of fun because algorithms do not go out of fashion: one never knows when an oldie-but-goodie might come in handy. Case in point: Yesterday, Google announced Google Trips, a new app to assist you in your travels by helping you create your own “perfect day” in a city. Surprisingly, deep inside Google Trips, there is an algorithm that was invented 280 years ago. In 1736, Leonhard Euler authored a brief but beautiful mathematical paper regarding the town of Königsberg and its 7 bridges, shown here:Image from WikipediaIn the paper, Euler studied the following question: is it possible to walk through the city crossing each bridge exactly once? As it turns out, for the city of Königsberg, the answer is no. To reach this answer, Euler developed a general approach to represent any layout of landmasses and bridges in terms of what he dubbed the Geometriam Situs (the “Geometry of Place”), which we now call Graph Theory. He represented each landmass as a “node” in the graph, and each bridge as an “edge,” like this:Image from WikipediaEuler noticed that if all the nodes in the graph have an even number of edges (such graphs are called “Eulerian” in his honor) then, and only then, a cycle can be found that visits every edge exactly once. Keep this in mind, as we’ll rely on this fact later in the post.Our team in Google Research has been fascinated by the “Geometry of Place” for some time, and we started investigating a question related to Euler’s: rather than visiting just the bridges, how can we visit as many interesting places as possible during a particular trip? We call this the “itineraries” problem. Euler didn’t study it, but it is a well known topic in Optimization, where it is often called the “Orienteering” problem.While Euler’s problem has an efficient and exact solution, the itineraries problem is not just hard to solve, it is hard to even approximately solve! The difficulty lies in the interplay between two conflicting goals: first, we should pick great places to visit, but second, we should pick them to allow a good itinerary: not too much travel time; don’t visit places when they’re closed; don’t visit too many museums, etc. Embedded in such problems is the challenge of finding efficient routes, often referred to as the Travelling Salesman Problem (TSP).Algorithms for Travel ItinerariesFortunately, the real world has a property called the “triangle inequality” that says adding an extra stop to a route never makes it shorter. When the underlying geometry satisfies the triangle inequality, the TSP can be approximately solved using another algorithm discovered by Christofides in 1976. This is an important part of our solution, and builds on Euler’s paper, so we’ll give a quick four-step rundown of how it works here:We start with all our destinations separate, and repeatedly connect together the closest two that aren’t yet connected. This doesn’t yet give us an itinerary, but it does connect all the destinations via a minimum spanning tree of the graph.We take all the destinations that have an odd number of connections in this tree (Euler proved there must be an even number of these), and carefully pair them up.Because all the destinations now have an even number of edges, we’ve created an Eulerian graph, so we create a route that crosses each edge exactly once.We now have a great route, but it might visit some places more than once. No problem, we find any double visits and simply bypass them, going directly from the predecessor to the successor.Christofides gave an elegant proof that the resulting route is always close to the shortest[...]

The 2016 Google Earth Engine User Summit: Turning pixels into insights


Posted by Chris Herwig, Program Manager, Google Earth Engine"We are trying new methods [of flood modeling] in Earth Engine based on machine learning techniques which we think are cheaper, more scalable, and could exponentially drive down the cost of flood mapping and make it accessible to everyone."-Beth Tellman, Arizona State University and Cloud to Street Recently, Google headquarters hosted the Google Earth Engine User Summit 2016, a three-day hands-on technical workshop for scientists and students interested in using Google Earth Engine for planetary-scale cloud-based geospatial analysis. Earth Engine combines a multi-petabyte catalog of satellite imagery and geospatial datasets with a simple, yet powerful API backed by Google's cloud, which scientists and researchers use to detect, measure, and predict changes to the Earth's surface. Earth Engine founder Rebecca Moore kicking off the first day of the summitSummit attendees could choose among twenty-five hands-on workshops over the course of the three day summit, most generated for the summit specifically, giving attendees an exclusive introduction to the latest features in our platform. The sessions covered a wide range of topics and Earth Engine experience levels, from image classifiers and classifications, time series analysis, building custom web applications, all the way to arrays, matrices, and linear algebra in Earth Engine. Terra Bella Product Manager, Kristi Bohl, taught a session on using SkySat imagery, like the image above over Sydney, Australia, for change detection. Workshop attendees also learned how to take advantage of the deep temporal stack the SkySat archive offers for change-over-time analyses.Cross-correlation between Landsat 8 NDVI and the sum of CHIRPS precipitation. Red is high cross-correlation and blue is low. The gap in data is because CHIRPS is masked over water.Nick Clinton, a developer advocate for Earth Engine, taught a time series session that covered statistical techniques as applied to satellite imagery data. Students learned how to make graphics like the above, which shows the cross-correlation between Landsat 8 NDVI and the sum of CHIRPS precipitation from the previous month over San Francisco, CA. The correlation should be high for relatively r-selected plants like grasses and weeds and relatively low for perennials, shrubs, or forest.My workshop session covered how users can upload their own data into Earth Engine and the many different ways to take the results of their analyses with them, including rendering static map tiles hosted on Google Cloud Storage, exporting images, creating new assets, and even making movies, like this timelapse video of all the Sentinel 2A images captured over Sydney Australia.Along with the workshop sessions, we hosted five plenary speakers and 18 lightning talk presenters. These presenters shared how Earth Engine fits into their research, spanning from drought monitoring, agriculture, conservation, flood risk mapping, and hydrological analysis. Plenary SpeakersAgriculture in the Sentinel era: scaling up with Earth Engine, Guido Lemoine, European Commission's Joint Research CentreFlood Vulnerability from the Cloud to the Street (and back!) powered by Google Earth Engine, Beth Tellman, Arizona State University and Cloud to StreetAccelerating Rangeland Conservation, Brady Allred, University of MontanaMonitoring Drought with Google Earth Engine: From Archives to Answers, Justin Huntington, Desert Research Institute / Western Regional Climate CenterAutomated methods for surface water detection, Gennadii Donchytes, DeltaresLightning PresentationsMapping the Behavior of Rivers, Alex Bryk, University of California, BerkeleyClimate Data for Crisis and Health Applications, Pietro Ceccato, Columbia UniversityAppalachian Communities at Risk, [...]

Research from VLDB 2016: Improved Friend Suggestion using Ego-Net Analysis


Posted by Alessandro Epasto, Research Scientist, Google Research NYOn September 5 - 9, New Delhi, India hosted the 42nd International Conference on Very Large Data Bases (VLDB), a premier annual forum for academic and industry research on databases, data management, data mining and data analytics. Over the past several years, Google has actively participated in VLDB, both as official sponsor and with numerous contributions to the research and industrial tracks. In this post, we would like to share the research presented in one of the Google papers from VLDB 2016. In Ego-net Community Mining Applied to Friend Suggestion, co-authored by Googlers Silvio Lattanzi, Vahab Mirrokni, Ismail Oner Sebe, Ahmed Taei, Sunita Verma and myself, we explore how social networks can provide better friend suggestions to users, a challenging practical problem faced by all social network platformsFriend suggestion – the task of suggesting to a user the contacts she might already know in the network but that she hasn’t added yet – is major driver of user engagement and social connection in all online social networks. Designing a high quality system that can provide relevant and useful friend recommendations is very challenging, and requires state-of-the-art machine learning algorithms based on a multitude of parameters. An effective family of features for friend suggestion consist of graph features such as the number of common friends between two users. While widely used, the number of common friends has some major drawbacks, including the following which is shown in Figure 1.Figure 1: Ego-net of Sally.In this figure we represent the social connections of Sally and her friends – the ego-net of Sally. An ego-net of a node (in this case, Sally) is defined as the graph that contains the node itself, all of the node’s neighbors and the connection among those nodes. Sally has 6 friends in her ego-net: Albert (her husband), Brian (her son), Charlotte (her mother) as well as Uma (her boss), Vincent and Wally (two of her team members). Notice how A, B and C are all connected with each other while they do not know U, V or W. On the other hand U, V and W have all added each other as their friend (except U and W who are good friend but somehow forgot to add each other).Notice how each of A, B, C have a common friend with each of U, V and W: Sally herself. A friend recommendation system based on common neighbors might suggest to Sally’s son (for instance) to add Sally’s boss as his friend! In reality the situation is even more complicated because users’ online and offline friends span several different social circles or communities (family, work, school, sports, etc). In our paper we introduce a novel technique for friend suggestions based on independently analyzing the ego-net structure. The main contribution of the paper is to show that it is possible to provide friend suggestions efficiently by constructing all ego-nets of the nodes in the graph and then independently applying community detection algorithms on them in large-scale distributed systems. Specifically, the algorithm proceeds by constructing the ego-nets of all nodes and applying, independently on each of them, a community detection algorithm. More precisely the algorithm operates on so-called “ego-net-minus-ego” graphs, which is defined as the graph including only the neighbors of a given node, as shown in the figure below.Figure 2: Clustering of the ego-net of Sally.Notice how in this example the ego-net-minus-ego of Sally has two very clear communities: her family (A, B, C) and her co-workers (U, V, W) which are easily separated. Intuitively, this is because one might expect that while nodes (e.g. Sally) participate in many communities, there is usually a single (or a li[...]

Computational Thinking from a Dispositions Perspective


Posted by Chris Stephenson, Head of Computer Science Education Programs at Google, and Joyce Malyn-Smith, Managing Project Director at Education Development Center (EDC) (Cross-posted on the Google for Education Blog)In K–12 computer science (CS) education, much of the discussion about what students need to learn and do to has centered around computational thinking (CT). While much of the current work in CT education is focused on core concepts and their application, the one area of CT that has not been well explored is the relationship between CT as a problem solving model, and the dispositions or habits of mind that it can build in students of all ages. Exploring the mindset that CT education can engender depends, in part, on the definition of CT itself. While there are a number of definitions of CT in circulation, Valerie Barr and I defined it in the following way:CT is an approach to solving problems in a way that can be implemented with a computer. Students become not merely tool users but tool builders. They use a set of concepts, such as abstraction, recursion, and iteration, to process and analyze data, and to create real and virtual artifacts. CT is a problem solving methodology that can be automated and transferred and applied across subjects.Like many others, our view of CT also included the core CT concepts: abstraction, algorithms and procedures, automation, data collection and analysis, data representation, modeling and simulation, parallelization and problem decomposition.The idea of dispositions, however, comes from the field of vocational education and research on career development which focuses on the personal qualities or soft skills needed for employment (see full report from Economist Intelligence Unit here). These skills traditionally include being responsible, adaptable, flexible, self-directed, and self-motivated; being able to solve simple and complex problems, having integrity, self-confidence, and self-control. They can also include the ability to work with people of different ages and cultures, collaboration, complex communication and expert thinking. Cuoco, Goldenberg, and Mark’s research also provided examples of what students should learn to develop the habits of mind used by scientists across numerous disciplines. These are: recognizing patterns, experimenting, describing, tinkering, inventing, visualizing, and conjecturing. Potter and Vickers also found that in the burgeoning field of cyber security “there is significant overlap between the roles for many soft skills, including analysis, consulting and process skills, leadership, and relationship management. Both communication and presentation skills were valued.”CT, because of its emphasis on problem solving, provides a natural environment for embedding the idea of dispositions into K-12. According to the International Society for Technology in Education and the Computer Science Teachers Association, the set of dispositions that student practice and internalize while learning about CT can include:confidence in dealing with complexity,persistence in working with difficult problems,the ability to handle ambiguity,the ability to deal with open-ended problems,setting aside differences to work with others to achieve a common goal or solution, andknowing one's strengths and weaknesses when working with others.Any teacher in any discipline is likely to tell you that persistence, problem solving, collaboration and awareness of one’s strengths and limitations are critical to successful learning for all students. So how do we make these dispositions a more explicit part of the CT curriculum? One of the ways to do so is to to call them out directly to students and explain why they are important in all areas of th[...]

Announcing the First Annual Global PhD Fellowship Summit and the 2016 Google PhD Fellows


Posted by Michael Rennaker, Program Manager, University RelationsIn 2009, Google created the PhD Fellowship Program to recognize and support outstanding graduate students doing exceptional research in Computer Science and related disciplines. Now in its eighth year, our Fellowships have helped support over 250 graduate students in Australia, China and East Asia, India, North America, Europe and the Middle East who seek to shape and influence the future of technology.Recently, Google PhD Fellows from around the globe converged on our Mountain View campus for the first annual Global PhD Fellowship Summit. The students heard talks from researchers like Jeff Dean, Françoise Beaufays, Peter Norvig, Maya Gupta and Amin Vahdat, and got a glimpse into some of the state-of-the-art research pursued across Google. Senior Google Fellow Jeff Dean shares how TensorFlow is used at GoogleFellows also had the chance to connect one-on-one with Googlers to discuss their research, as well as receive feedback from leaders in their fields. The event wrapped up with a panel discussion with Dan Russell, Kristen LeFevre, Douglas Eck and Françoise Beaufays about their unique career paths. Maggie Johnson concluded the Summit by sharing about the different types of research environments across academia and industry.(Left) PhD Fellows share their work with Google researchers during the poster session(Right) Research panelists share their journeys through academia and industryOur PhD Fellows represent some the best and brightest young researchers around the globe in Computer Science and it is our ongoing goal to support them as they make their mark on the world.We’d also like to welcome the newest class of Google PhD Fellows recently awarded in China and East Asia, India, and Australia. We look forward to seeing each of them at next year’s summit!2016 Global PhD FellowsComputational NeuroscienceCameron (Po-Hsuan) Chen, Princeton UniversityGrace Lindsay, Columbia UniversityMartino Sorbaro Sindaci, The University of EdinburghHuman-Computer InteractionDana McKay, University of MelbourneKoki Nagano, University of Southern CaliforniaArvind Satyanarayan, Stanford UniversityAmy Xian Zhang, Massachusetts Institute of TechnologyMachine LearningOlivier Bachem, Swiss Federal Institute of Technology ZurichTianqi Chen, University of WashingtonEmily Denton, New York UniversityKwan Hui Lim, University of MelbourneYves-Laurent Kom Samo, University of OxfordWoosang Lim, Korea Advanced Institute of Science and TechnologyAnirban Santara, Indian Institute of Technology KharagpurDaniel Jaymin Mankowitz, Technion - Israel Institute of TechnologyLucas Maystre, École Polytechnique Fédérale de LausanneArvind Neelakantan, University of Massachusetts, AmherstLudwig Schmidt, Massachusetts Institute of TechnologyQuanming Yao, The Hong Kong University of Science and TechnologyShandian Zhe, Purdue University, West LafayetteMachine Perception, Speech Technology and Computer VisionEugen Beck, RWTH Aachen UniversityYu-Wei Chao, University of Michigan, Ann ArborWei Liu, University of North Carolina at Chapel HillAron Monszpart, University College LondonThomas Schoeps, Swiss Federal Institute of Technology ZurichTian Tan, Shanghai Jiao Tong UniversityChia-Yin Tsai, Carnegie Mellon UniversityWeitao Xu, University of QueenslandMarket AlgorithmsHossein Esfandiari, University of Maryland, College ParkSandy Heydrich, Saarland University - Saarbrucken GSCSRad Niazadeh, Cornell UniversitySadra Yazdanbod, Georgia Institute of TechnologyMobile ComputingLei Kang, University of WisconsinTauhidur Rahman, Cornell UniversityChungkuk Yoo, Korea Advanced Institute of Science and TechnologyYuhao Zhu, University of Texas, AustinNatural Language[...]

Reproducible Science: Cancer Researchers Embrace Containers in the Cloud


Posted by Dr. Kyle Ellrott, Oregon Health and Sciences University, Dr. Josh Stuart, University of California Santa Cruz, and Dr. Paul Boutros, Ontario Institute for Cancer ResearchToday we hear from the principal investigators of the ICGC-TCGA DREAM Somatic Mutation Calling Challenges about how they are encouraging cancer researchers to make use of Docker and Google Cloud Platform to gain a deeper understanding of the complex genetic mutations that occur in cancer, while doing so in a reproducible way.– Nicole Deflaux and Jonathan Bingham, Google GenomicsToday’s genomic analysis software tools often give different answers when run in different computing environments - that’s like getting a different diagnosis from your doctor depending on which examination room you’re sitting in. Reproducible science matters, especially in cancer research where so many lives are at stake. The Cancer Moonshot has called for the research world to 'Break down silos and bring all the cancer fighters together'. Portable software “containers” and cloud computing hold the potential to help achieve these goals by making scientific data analysis more reproducible, reusable and scalable. Our team of researchers from the Ontario Institute for Cancer Research, University of California Santa Cruz, Sage Bionetworks and Oregon Health and Sciences University is pushing the frontiers by encouraging scientists to package up their software in reusable Docker containers and make use of cloud-resident data from the Cancer Cloud Pilots funded by the National Cancer Institute.In 2014 we initiated the ICGC-TCGA DREAM Somatic Mutation Calling (SMC) Challenges where Google provided credits on Google Cloud Platform. The first result of this collaboration was the DREAM-SMC DNA challenge, a public challenge that engaged cancer researchers from around the world to find the best methods for discovering DNA somatic mutations. By the end of the challenge, over 400 registered participants competed by submitting 3,500 open-source entries for 14 test genomes, providing key insights on the strengths and limitations of the current mutation detection methods.The SMC-DNA challenge enabled comparison of results, but it did little to facilitate the exchange of cross-platform software tools. Accessing extremely large genome sequence input files and shepherding complex software pipelines created a “double whammy” to discourage data sharing and software reuse.How can we overcome these barriers?Exciting developments have taken place in the past couple of years that may annihilate these last barriers. The availability of cloud technologies and containerization can serve as the vanguards of reproducibility and interoperability.Thus, a new way of creating open DREAM challenges has emerged: rather than encouraging the status quo where participants run their own methods themselves on their own systems, and the results cannot be verified, the new challenge design requires participants to submit open-source code packaged in Docker containers so that anyone can run their methods and verify the results. Real-time leaderboards show which entries are winning and top performers have a chance to claim a prize. Working with Google Genomics and Google Cloud Platform, the DREAM-SMC organizers are now using cloud and containerization technologies to enable portability and reproducibility as a core part of the DREAM challenges. The latest SMC installments, the SMC-Het Challenge and the SMC-RNA Challenge have implemented this new plan:SMC-Het Challenge: Tumour biopsies are composed of many different cell types in addition to tumour cells, including normal tissue and infiltrating immune cells. Furthermore, th[...]

Improving Inception and Image Classification in TensorFlow


Posted by Alex Alemi, Software Engineer Earlier this week, we announced the latest release of the TF-Slim library for TensorFlow, a lightweight package for defining, training and evaluating models, as well as checkpoints and model definitions for several competitive networks in the field of image classification. In order to spur even further progress in the field, today we are happy to announce the release of Inception-ResNet-v2, a convolutional neural network (CNN) that achieves a new state of the art in terms of accuracy on the ILSVRC image classification benchmark. Inception-ResNet-v2 is a variation of our earlier Inception V3 model which borrows some ideas from Microsoft's ResNet papers [1][2]. The full details of the model are in our arXiv preprint Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning.Residual connections allow shortcuts in the model and have allowed researchers to successfully train even deeper neural networks, which have lead to even better performance. This has also enabled significant simplification of the Inception blocks. Just compare the model architectures in the figures below:Schematic diagram of Inception V3Schematic diagram of Inception-ResNet-v2At the top of the second Inception-ResNet-v2 figure, you'll see the full network expanded. Notice that this network is considerably deeper than the previous Inception V3. Below in the main figure is an easier to read version of the same network where the repeated residual blocks have been compressed. Here, notice that the inception blocks have been simplified, containing fewer parallel towers than the previous Inception V3.The Inception-ResNet-v2 architecture is more accurate than previous state of the art models, as shown in the table below, which reports the Top-1 and Top-5 validation accuracies on the ILSVRC 2012 image classification benchmark based on a single crop of the image. Furthermore, this new model only requires roughly twice the memory and computation compared to Inception V3. Model Architecture Checkpoint Top-1 Accuracy Top-5 Accuracy Inception-ResNet-v2 Code inception_resnet_v2_2016_08_30.tar.gz 80.4 95.3 Inception V3 Code inception_v3_2016_08_28.tar.gz 78.0 93.9 ResNet 152 Code resnet_v1_152_2016_08_28.tar.gz 76.8 93.2 ResNet V2 200 Code TBA 79.9* 95.2* (*): Results quoted in ResNet paper.As an example, while both Inception V3 and Inception-ResNet-v2 models excel at identifying individual dog breeds, the new model does noticeably better. For instance, whereas the old model mistakenly reported Alaskan Malamute for the picture on the right, the new Inception-ResNet-v2 model correctly identifies the dog breeds in both images.An Alaskan Malamute (left) and a Siberian Husky (right). Images from WikipediaIn order to allow people to immediately begin experimenting, we are also releasing a pre-trained instance of the new Inception-ResNet-v2, as part of the TF-Slim Image Model Library.We are excited to see what the community does with this improved model, following along as people adapt it and compare its performance on various tasks. Want to get started? See the accompanying instructions on how to train, evaluate or fine-tune a network.As always, releasing the code was a team effort. Specific thanks are due to:Model Architecture - Christian Szegedy, Sergey Ioffe, Vincent Vanhoucke, Alex AlemiSystems Infrastructure - Jon Shlens, Benoit Steiner, Mark Sandler, and David AndersenTensorFlow-Slim - Sergio Guadarrama and Nathan SilbermanModel Visualization - Fernanda Viégas and James Wexler [...]

TF-Slim: A high level library to define complex models in TensorFlow


Posted by Nathan Silberman and Sergio Guadarrama, Google Research Earlier this year, we released a TensorFlow implementation of a state-of-the-art image classification model known as Inception-V3. This code allowed users to train the model on the ImageNet classification dataset via synchronized gradient descent, using either a single local machine or a cluster of machines. The Inception-V3 model was built on an experimental TensorFlow library called TF-Slim, a lightweight package for defining, training and evaluating models in TensorFlow. The TF-Slim library provides common abstractions which enable users to define models quickly and concisely, while keeping the model architecture transparent and its hyperparameters explicit.Since that release, TF-Slim has grown substantially, with many types of layers, loss functions, and evaluation metrics added, along with handy routines for training and evaluating models. These routines take care of all the details you need to worry about when working at scale, such as reading data in parallel, deploying models on multiple machines, and more. Additionally, we have created the TF-Slim Image Models library, which provides definitions and training scripts for many widely used image classification models, using standard datasets. TF-Slim and its components are already widely used within Google, and many of these improvements have already been integrated into tf.contrib.slim.Today, we are proud to share the latest release of TF-Slim with the TF community. Some highlights of this release include:Many new kinds of layers (such as Atrous Convolution and Deconvolution) enabling a much richer family of neural network architectures.Support for more loss functions and evaluation metrics (e.g., mAP, IoU).A deployment library to make it easier to perform synchronous or asynchronous training using multiple GPUs/CPUs, on the same machine or on multiple machines.Code to define and train many widely used image classification models (e.g., Inception[1][2][3], VGG[4], AlexNet[5], ResNet[6]).Pre-trained model weights for the above image classification models. These models have been trained on the ImageNet classification dataset, but can be used for many other computer vision tasks. As a simple example, we provide code to fine-tune these classifiers to a new set of output labels.Tools to easily process standard image datasets, such as ImageNet, CIFAR10 and MNIST.Want to get started using TF-Slim? See the README for details. Interested in working with image classification models? See these instructions or this Jupyter notebook.The release of the TF-Slim library and the pre-trained model zoo has been the result of widespread collaboration within Google Research. In particular we want to highlight the vital contributions of the following researchers:TF-Slim: Sergio Guadarrama, Nathan Silberman.Model Definitions and Checkpoints: Christian Szegedy, Sergey Ioffe, Vincent Vanhoucke, Jon Shlens, Zbigniew Wojna, Vivek Rathod, George Papandreou, Alex AlemiSystems Infrastructure: Jon Shlens, Matthieu Devin, Martin WickeJupyter notebook: Nathan Silberman, Kevin MurphyReferences:[1] Going deeper with convolutions, Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, Andrew Rabinovich, CVPR 2015[2] Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift Sergey Ioffe, Christian Szegedy, ICML 2015[3] Rethinking the Inception Architecture for Computer Vision, Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jonathon Shlens, Zbigniew Wojna, arXiv technical report 2015[4] Very Deep Co[...]

Text summarization with TensorFlow


Posted by Peter Liu and Xin Pan, Software Engineers, Google Brain TeamEvery day, people rely on a wide variety of sources to stay informed -- from news stories to social media posts to search results. Being able to develop Machine Learning models that can automatically deliver accurate summaries of longer text can be useful for digesting such large amounts of information in a compressed form, and is a long-term goal of the Google Brain team. Summarization can also serve as an interesting reading comprehension test for machines. To summarize well, machine learning models need to be able to comprehend documents and distill the important information, tasks which are highly challenging for computers, especially as the length of a document increases.In an effort to push this research forward, we’re open-sourcing TensorFlow model code for the task of generating news headlines on Annotated English Gigaword, a dataset often used in summarization research. We also specify the hyper-parameters in the documentation that achieve better than published state-of-the-art on the most commonly used metric as of the time of writing. Below we also provide samples generated by the model. Extractive and Abstractive summarizationOne approach to summarization is to extract parts of the document that are deemed interesting by some metric (for example, inverse-document frequency) and join them to form a summary. Algorithms of this flavor are called extractive summarization.Original Text: Alice and Bob took the train to visit the zoo. They saw a baby giraffe, a lion, and a flock of colorful tropical birds. Extractive Summary: Alice and Bob visit the zoo. saw a flock of birds.Above we extract the words bolded in the original text and concatenate them to form a summary. As we can see, sometimes the extractive constraint can make the summary awkward or grammatically strange. Another approach is to simply summarize as humans do, which is to not impose the extractive constraint and allow for rephrasings. This is called abstractive summarization.Abstractive summary: Alice and Bob visited the zoo and saw animals and birds.In this example, we used words not in the original text, maintaining more of the information in a similar amount of words. It’s clear we would prefer good abstractive summarizations, but how could an algorithm begin to do this?About the TensorFlow modelIt turns out for shorter texts, summarization can be learned end-to-end with a deep learning technique called sequence-to-sequence learning, similar to what makes Smart Reply for Inbox possible. In particular, we’re able to train such models to produce very good headlines for news articles. In this case, the model reads the article text and writes a suitable headline.To get an idea of what the model produces, you can take a look at some examples below. The first column shows the first sentence of a news article which is the model input, and the second column shows what headline the model has written.Input: Article 1st sentenceModel-written headlinemetro-goldwyn-mayer reported a third-quarter net loss of dlrs 16 million due mainly to the effect of accounting rules adopted this year mgm reports 16 million net loss on higher revenue starting from july 1, the island province of hainan in southern china will implement strict market access control on all incoming livestock and animal products to prevent the possible spread of epidemic diseases hainan to curb spread of diseasesaustralian wine exports hit a record 52.1 million liters worth 260 million dollars (143 million us) in september, the government statistics office reporte[...]

Meet Parsey’s Cousins: Syntax for 40 languages, plus new SyntaxNet capabilities


Posted by Chris Alberti, Dave Orr & Slav Petrov, Google Natural Language Understanding TeamJust in time for ACL 2016, we are pleased to announce that Parsey McParseface, released in May as part of SyntaxNet and the basis for the Cloud Natural Language API, now has 40 cousins! Parsey’s Cousins is a collection of pretrained syntactic models for 40 languages, capable of analyzing the native language of more than half of the world’s population at often unprecedented accuracy. To better address the linguistic phenomena occurring in these languages we have endowed SyntaxNet with new abilities for Text Segmentation and Morphological Analysis.When we released Parsey, we were already planning to expand to more languages, and it soon became clear that this was both urgent and important, because researchers were having trouble creating top notch SyntaxNet models for other languages.The reason for that is a little bit subtle. SyntaxNet, like other TensorFlow models, has a lot of knobs to turn, which affect accuracy and speed. These knobs are called hyperparameters, and control things like the learning rate and its decay, momentum, and random initialization. Because neural networks are more sensitive to the choice of these hyperparameters than many other machine learning algorithms, picking the right hyperparameter setting is very important. Unfortunately there is no tested and proven way of doing this and picking good hyperparameters is mostly an empirical science -- we try a bunch of settings and see what works best.An additional challenge is that training these models can take a long time, several days on very fast hardware. Our solution is to train many models in parallel via MapReduce, and when one looks promising, train a bunch more models with similar settings to fine-tune the results. This can really add up -- on average, we train more than 70 models per language. The plot below shows how the accuracy varies depending on the hyperparameters as training progresses. The best models are up to 4% absolute more accurate than ones trained without hyperparameter tuning.Held-out set accuracy for various English parsing models with different hyperparameters (each line corresponds to one training run with specific hyperparameters). In some cases training is a lot slower and in many cases a suboptimal choice of hyperparameters leads to significantly lower accuracy. We are releasing the best model that we were able to train for each language.In order to do a good job at analyzing the grammar of other languages, it was not sufficient to just fine-tune our English setup. We also had to expand the capabilities of SyntaxNet. The first extension is a model for text segmentation, which is the task of identifying word boundaries. In languages like English, this isn’t very hard -- you can mostly look for spaces and punctuation. In Chinese, however, this can be very challenging, because words are not separated by spaces. To correctly analyze dependencies between Chinese words, SyntaxNet needs to understand text segmentation -- and now it does.Analysis of a Chinese string into a parse tree showing dependency labels, word tokens, and parts of speech (read top to bottom for each word token).The second extension is a model for morphological analysis. Morphology is a language feature that is poorly represented in English. It describes inflection: i.e., how the grammatical function and meaning of the word changes as its spelling changes. In English, we add an -s to a word to indicate plurality. In Russian, a heavily inflected language, morphology can indi[...]

ACL 2016 & Research at Google


Posted by Slav Petrov, Research ScientistThis week, Berlin hosts the 2016 Annual Meeting of the Association for Computational Linguistics (ACL 2016), the premier conference of the field of computational linguistics, covering a broad spectrum of diverse research areas that are concerned with computational approaches to natural language. As a leader in Natural Language Processing (NLP) and a Platinum Sponsor of the conference, Google will be on hand to showcase research interests that include syntax, semantics, discourse, conversation, multilingual modeling, sentiment analysis, question answering, summarization, and generally building better learners using labeled and unlabeled data, state-of-the-art modeling, and learning from indirect supervision. Our systems are used in numerous ways across Google, impacting user experience in search, mobile, apps, ads, translate and more. Our work spans the range of traditional NLP tasks, with general-purpose syntax and semantic algorithms underpinning more specialized systems.Our researchers are experts in natural language processing and machine learning, and combine methodological research with applied science, and our engineers are equally involved in long-term research efforts and driving immediate applications of our technology. If you’re attending ACL 2016, we hope that you’ll stop by the booth to check out some demos, meet our researchers and discuss projects and opportunities at Google that go into solving interesting problems for billions of people. Learn more about Google research being presented at ACL 2016 below (Googlers highlighted in blue), and visit the Natural Language Understanding Team page at Transition-based Dependency Parsing via Control ParametersBernd Bohnet, Ryan McDonald, Emily Pitler, Ji MaLearning the Curriculum with Bayesian Optimization for Task-Specific Word Representation LearningYulia Tsvetkov, Manaal Faruqui, Wang Ling (Google DeepMind), Chris Dyer (Google DeepMind)Morpho-syntactic Lexicon Generation Using Graph-based Semi-supervised Learning (TACL)Manaal Faruqui, Ryan McDonald, Radu SoricutMany Languages, One Parser (TACL)Waleed Ammar, George Mulcaire, Miguel Ballesteros, Chris Dyer (Google DeepMind)*, Noah A. Smith Latent Predictor Networks for Code GenerationWang Ling (Google DeepMind), Phil Blunsom (Google DeepMind), Edward Grefenstette (Google DeepMind), Karl Moritz Hermann (Google DeepMind), Tomáš Kočiský (Google DeepMind), Fumin Wang (Google DeepMind), Andrew Senior (Google DeepMind) Collective Entity Resolution with Multi-Focal AttentionAmir Globerson, Nevena Lazic, Soumen Chakrabarti, Amarnag Subramanya, Michael Ringgaard, Fernando PereiraPlato: A Selective Context Model for Entity Resolution (TACL)Nevena Lazic, Amarnag Subramanya, Michael Ringgaard, Fernando Pereira WikiReading: A Novel Large-scale Language Understanding Task over WikipediaDaniel Hewlett, Alexandre Lacoste, Llion Jones, Illia Polosukhin, Andrew Fandrianto, Jay Han, Matthew Kelcey, David Berthelot Stack-propagation: Improved Representation Learning for SyntaxYuan Zhang, David WeissCross-lingual Models of Word Embeddings: An Empirical ComparisonShyam Upadhyay, Manaal Faruqui, Chris Dyer (Google DeepMind), Dan Roth Globally Normalized Transition-Based Neural Networks (Outstanding Papers Session)Daniel Andor, Chris Alberti, David Weiss, Aliaksei Severyn, Alessandro Presta, Kuzman[...]

Computational Thinking for All Students


Posted by Maggie Johnson, Director of Education and University Relations, Google(Crossposted on the Google for Education Blog, and the the Huffington Post)Last year, I wrote about the importance of teaching computational thinking to all K-12 students. Given the growing use of computing, algorithms and data in all fields from the humanities to medicine to business, it’s becoming increasingly important for students to understand the basics of computer science (CS). One lesson we have learned through Google’s CS education outreach efforts is that these skills can be accessible to all students, if we introduce them early in K-5. These are truly 21st century skills which can, over time, produce a workforce ready for a technology-enabled and driven economy. How can teachers start introducing computational thinking in early school curriculum? It is already present in many topic areas - algorithms for solving math problems, for example. However, what is often missing in current examples of computational thinking is the explicit connection between what students are learning and its application in computing. For example, once a student has mastered adding multi-digit numbers, the following algorithm could be presented:Add together the digits in the ones place. If the result is < 10, it becomes the ones digit of the answer. If it's >= 10 or greater, the ones digit of the result becomes the ones digit of the answer, and you add 1 to the next column.Add together the digits in the tens place, plus the 1 carried over from the ones place, if necessary. If the answer < than 10, it becomes the tens digit of the answer; if it's >= 10, the ones digit becomes the tens digit of the answer and 1 is added to the next column.Repeat this process for any additional columns until they are all added.This allows a teacher to present the concept of an algorithm and its use in computing, as well as the most important elements of any computer program: conditional branching (“if the result is less than 10…”) and iteration (“repeat this process…”). Going a step farther, a teacher translating the algorithm into a running program can have a compelling effect. When something that students have used to solve an instance of a problem can automatically solve all instances of the that problem, it’s quite a powerful moment for them even if they don’t do the coding themselves. Google has created an online course for K-12 teachers to learn about computational thinking and how to make these explicit connections for their students. We also have a large repository of lessons, explorations and programs to support teachers and students. Our videos illustrate real-world examples of the application of computational thinking in Google’s products and services, and we have compiled a set of great resources showing how to integrate computational thinking into existing curriculum. We also recently announced Project Bloks to engage younger children in computational thinking. Finally,, for whom Google is a primary sponsor, has curriculum and materials for K-5 teachers and students. We feel that computational thinking is a core skill for all students. If we can make these explicit connections for students, they will see how the devices and apps that they use everyday are powered by algorithms and programs. They will learn the importance of data in making decisions. They will learn skills that will prepare them for a workforce that will be doing vastly different tasks than the workforce of today. We o[...]

Announcing an Open Source ADC board for BeagleBone


Posted by Jason Holt, Software Engineer(Cross-posted on the Google Open Source Blog)Working with electronics, we often find ourselves soldering up a half baked electronic circuit to detect some sort of signal. For example, last year we wanted to measure the strength of a carrier. We started with traditional analog circuits — amplifier, filter, envelope detector, threshold. You can see some of our prototypes in the image below; they get pretty messy.While there's a certain satisfaction in taming a signal using the physical properties of capacitors, coils of wire and transistors, it's usually easier to digitize the signal with an Analog to Digital Converter (ADC) and manage it with Digital Signal Processing (DSP) instead of electronic parts. Tweaking software doesn't require a soldering iron, and lets us modify signals in ways that would require impossible analog circuits.There are several standard solutions for digitizing a signal: connect a laptop to an oscilloscope or Data Acquisition System (DAQ) via USB or Ethernet, or use the onboard ADCs of a maker board like an Arduino. The former are sensitive and accurate, but also big and power hungry. The latter are cheap and tiny, but slower and have enough RAM for only milliseconds worth of high speed sample data. That led us to investigate single board computers like the BeagleBone and Raspberry Pi, which are small and cheap like an Arduino, but have specs like a smartphone. And crucially, the BeagleBone's system-on-a-chip (SoC) combines a beefy ARMv7 CPU with two smaller Programmable Realtime Units (PRUs) that have access to all 512MB of system RAM. This lets us dedicate the PRUs to the time-sensitive and repetitive task of reading each sample out of an external ADC, while the main CPU lets us use the data with the GNU/Linux tools we're used to.The result is an open source BeagleBone cape we've named PRUDAQ. It's built around the Analog Devices AD9201 ADC, which samples two inputs simultaneously at up to 20 megasamples per second, per channel. Simultaneous sampling and high sample rates make it useful for software-defined radio (SDR) and scientific applications where a built-in ADC isn't quite up to the task. Our open source electrical design and sample code are available on GitHub, and GroupGets has boards ready to ship for $79. We also were fortunate to have help from Google intern Kumar Abhishek. He added support for PRUDAQ to his Google Summer of Code project BeagleLogic that performs much better than our sample code.We started PRUDAQ for our own needs, but quickly realized that others might also find it useful. We're excited to get your feedback through the email list. Tell us what can be done with inexpensive fast ADCs paired with inexpensive fast CPUs! [...]

Towards an exact (quantum) description of chemistry


Posted by Ryan Babbush, Quantum Software Engineer“...nature isn't classical, dammit, and if you want to make a simulation of nature, you'd better make it quantum mechanical...” - Richard Feynman, Simulating Physics with ComputersOne of the most promising applications of quantum computing is the ability to efficiently model quantum systems in nature that are considered intractable for classical computers. Now, in collaboration with the Aspuru-Guzik group at Harvard and researchers from Lawrence Berkeley National Labs, UC Santa Barbara, Tufts University and University College London, we have performed the first completely scalable quantum simulation of a molecule. Our experimental results are detailed in the paper Scalable Quantum Simulation of Molecular Energies, which recently appeared in Physical Review X.The goal of our experiment was to use quantum hardware to efficiently solve the molecular electronic structure problem, which seeks the solution for the lowest energy configuration of electrons in the presence of a given nuclear configuration. In order to predict chemical reaction rates (which govern the mechanism of chemical reactions), one must make these calculations to extremely high precision. The ability to predict such rates could revolutionize the design of solar cells, industrial catalysts, batteries, flexible electronics, medicines, materials and more. The primary difficulty is that molecular systems form highly entangled quantum superposition states which require exponentially many classical computing resources in order to represent to sufficiently high precision. For example, exactly computing the energies of methane (CH4) takes about one second, but the same calculation takes about ten minutes for ethane (C2H6) and about ten days for propane (C3H8).In our experiment, we focus on an approach known as the variational quantum eigensolver (VQE), which can be understood as a quantum analog of a neural network. Whereas a classical neural network is a parameterized mapping that one trains in order to model classical data, VQE is a parameterized mapping (e.g. a quantum circuit) that one trains in order to model quantum data (e.g. a molecular wavefunction). The training objective for VQE is the molecular energy function, which is always minimized by the true ground state. The quantum advantage of VQE is that quantum bits can efficiently represent the molecular wavefunction whereas exponentially many classical bits would be required.Using VQE, we quantum computed the energy landscape of molecular hydrogen, H2. We compared the performance of VQE to another quantum algorithm for chemistry, the phase estimation algorithm (PEA). Experimentally computed energies, as a function of the H - H bond length, are shown below alongside the exact curve. We were able to obtain such high performance with VQE because the neural-network-like training loop helped to establish experimentally optimal circuit parameters for representing the wavefunction in the presence of systematic control errors. One can understand this by considering a hardware implementation of a neural network with a faulty weight, e.g. the weight is only represented half as strong as it should be. Because the weights of the neural network are established via a closed-loop training procedure which can compensate for such systematic errors, the hardware neural network is robust against such imperfections. Likewise, despite systematic errors in our implementation of the VQE [...]

Wide & Deep Learning: Better Together with TensorFlow


Posted by Heng-Tze Cheng, Senior Software Engineer, Google ResearchThe human brain is a sophisticated learning machine, forming rules by memorizing everyday events (“sparrows can fly” and “pigeons can fly”) and generalizing those learnings to apply to things we haven't seen before (“animals with wings can fly”). Perhaps more powerfully, memorization also allows us to further refine our generalized rules with exceptions (“penguins can't fly”). As we were exploring how to advance machine intelligence, we asked ourselves the question—can we teach computers to learn like humans do, by combining the power of memorization and generalization?It's not an easy question to answer, but by jointly training a wide linear model (for memorization) alongside a deep neural network (for generalization), one can combine the strengths of both to bring us one step closer. At Google, we call it Wide & Deep Learning. It's useful for generic large-scale regression and classification problems with sparse inputs (categorical features with a large number of possible feature values), such as recommender systems, search, and ranking problems.Today we’re open-sourcing our implementation of Wide & Deep Learning as part of the TF.Learn API so that you can easily train a model yourself. Please check out the TensorFlow tutorials on Linear Models and Wide & Deep Learning, as well as our research paper to learn more. allowfullscreen="" class="YOUTUBE-iframe-video" data-thumbnail-src="" frameborder="0" height="360" src="" width="640">How Wide & Deep Learning works.Let's say one day you wake up with an idea for a new app called FoodIO*. A user of the app just needs to say out loud what kind of food he/she is craving for (the query). The app magically predicts the dish that the user will like best, and the dish gets delivered to the user's front door (the item). Your key metric is consumption rate—if a dish was eaten by the user, the score is 1; otherwise it's 0 (the label).You come up with some simple rules to start, like returning the items that match the most characters in the query, and you release the first version of FoodIO. Unfortunately, you find that the consumption rate is pretty low because the matches are too crude to be really useful (people shouting “fried chicken” end up getting “chicken fried rice”), so you decide to add machine learning to learn from the data.The Wide model.In the 2nd version, you want to memorize what items work the best for each query. So, you train a linear model in TensorFlow with a wide set of cross-product feature transformations to capture how the co-occurrence of a query-item feature pair correlates with the target label (whether or not an item is consumed). The model predicts the probability of consumption P(consumption | query, item) for each item, and FoodIO delivers the top item with the highest predicted consumption rate. For example, the model learns that feature AND(query="fried chicken", item="chicken and waffles") is a huge win, while AND(query="fried chicken", item="chicken fried rice") doesn't get as much love even though the character match is higher. In other words, FoodIO 2.0 does a pretty good job memorizing what users like, and it starts to get more traction.The Deep model.Later on you discover that many users are saying that t[...]

CVPR 2016 & Research at Google


Posted by Rahul Sukthankar, Research ScientistThis week, Las Vegas hosts the 2016 Conference on Computer Vision and Pattern Recognition (CVPR 2016), the premier annual computer vision event comprising the main conference and several co-located workshops and short courses. As a leader in computer vision research, Google has a strong presence at CVPR 2016, with many Googlers presenting papers and invited talks at the conference, tutorials and workshops.We congratulate Google Research Scientist Ce Liu and Google Faculty Advisor Abhinav Gupta, who were selected as this year’s recipients of the PAMI Young Researcher Award for outstanding research contributions within computer vision. We also congratulate Googler Henrik Stewenius for receiving the Longuet-Higgins Prize, a retrospective award that recognizes up to two CVPR papers from ten years ago that have made a significant impact on computer vision research, for his 2006 CVPR paper “Scalable Recognition with a Vocabulary Tree”, co-authored with David Nister, during their time at University of Kentucky.If you are attending CVPR this year, please stop by our booth and chat with our researchers about the projects and opportunities at Google that go into solving interesting problems for hundreds of millions of people. The Google booth will also showcase several recent efforts, including the technology behind Motion Stills, a live demo of neural network-based image compression and TensorFlow-Slim, the lightweight library for defining, training and evaluating models in TensorFlow. Learn more about our research being presented at CVPR 2016 in the list below (Googlers highlighted in blue).Oral PresentationsGeneration and Comprehension of Unambiguous Object DescriptionsJunhua Mao, Jonathan Huang, Alexander Toshev, Oana Camburu, Alan L. Yuille, Kevin Murphy Detecting Events and Key Actors in Multi-Person VideosVignesh Ramanathan, Jonathan Huang, Sami Abu-El-Haija, Alexander Gorban, Kevin Murphy, Li Fei-FeiSpotlight Session: 3D ReconstructionDeepStereo: Learning to Predict New Views From the World’s ImageryJohn Flynn, Ivan Neulander, James Philbin, Noah SnavelyPostersDiscovering the Physical Parts of an Articulated Object Class From Multiple VideosLuca Del Pero, Susanna Ricco, Rahul Sukthankar, Vittorio Ferrari Blockout: Dynamic Model Selection for Hierarchical Deep NetworksCalvin Murdock, Zhen Li, Howard Zhou, Tom DuerigRethinking the Inception Architecture for Computer VisionChristian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jon Shlens, Zbigniew WojnaImproving the Robustness of Deep Neural Networks via Stability TrainingStephan Zheng, Yang Song, Thomas Leung, Ian GoodfellowSemantic Image Segmentation With Task-Specific Edge Detection Using CNNs and a Discriminatively Trained Domain TransformLiang-Chieh Chen, Jonathan T. Barron, George Papandreou, Kevin Murphy, Alan L. YuilleTutorialOptimization Algorithms for Subset Selection and Summarization in Large Data SetsEhsan Elhamifar, Jeff Bilmes, Alex Kulesza, Michael GygliWorkshopsPerceptual Organization in Computer Vision: The Role of Feedback in Recognition and ReorganizationOrganizers: Katerina Fragkiadaki, Phillip Isola, Joao CarreiraInvited talks: Viren Jain, Jitendra MalikVQA Challenge WorkshopInvited talks: Jitendra Malik, Kevin MurphyWomen in Computer VisionInvited talk: Caroline PantofaruComputational Models for Learning System[...]