Subscribe: Google Research Blog
Added By: Feedage Forager Feedage Grade A rated
Language: English
data  deepmind  google deepmind  google  inception  learning  machine learning  machine  model  research  system  university 
Rate this Feed
Rate this feedRate this feedRate this feedRate this feedRate this feed
Rate this feed 1 starRate this feed 2 starRate this feed 3 starRate this feed 4 starRate this feed 5 star

Comments (0)

Feed Details and Statistics Feed Statistics
Preview: Google Research Blog

Google Research Blog

The latest news on Google Research.

Updated: 2016-09-29T22:38:00.890-07:00


Image Compression with Neural Networks


Posted by Nick Johnston and David Minnen, Software EngineersData compression is used nearly everywhere on the internet - the videos you watch online, the images you share, the music you listen to, even the blog you're reading right now. Compression techniques make sharing the content you want quick and efficient. Without data compression, the time and bandwidth costs for getting the information you need, when you need it, would be exorbitant!In "Full Resolution Image Compression with Recurrent Neural Networks", we expand on our previous research on data compression using neural networks, exploring whether machine learning can provide better results for image compression like it has for image recognition and text summarization. Furthermore, we are releasing our compression model via TensorFlow so you can experiment with compressing your own images with our network. We introduce an architecture that uses a new variant of the Gated Recurrent Unit (a type of RNN that allows units to save activations and process sequences) called Residual Gated Recurrent Unit (Residual GRU). Our Residual GRU combines existing GRUs with the residual connections introduced in "Deep Residual Learning for Image Recognition" to achieve significant image quality gains for a given compression rate. Instead of using a DCT to generate a new bit representation like many compression schemes in use today, we train two sets of neural networks - one to create the codes from the image (encoder) and another to create the image from the codes (decoder). Our system works by iteratively refining a reconstruction of the original image, with both the encoder and decoder using Residual GRU layers so that additional information can pass from one iteration to the next. Each iteration adds more bits to the encoding, which allows for a higher quality reconstruction. Conceptually, the network operates as follows:The initial residual, R[0], corresponds to the original image I: R[0] = I.Set i=1 for to the first iteration.Iteration[i] takes R[i-1] as input and runs the encoder and binarizer to compress the image into B[i].Iteration[i] runs the decoder on B[i] to generate a reconstructed image P[i].The residual for Iteration[i] is calculated: R[i] = I - P[i].Set i=i+1 and go to Step 3 (up to the desired number of iterations).The residual image represents how different the current version of the compressed image is from the original. This image is then given as input to the network with the goal of removing the compression errors from the next version of the compressed image. The compressed image is now represented by the concatenation of B[1] through B[N]. For larger values of N, the decoder gets more information on how to reduce the errors and generate a higher quality reconstruction of the original image.To understand how this works, consider the following example of the first two iterations of the image compression network, shown in the figures below. We start with an image of a lighthouse. On the first pass through the network, the original image is given as an input (R[0] = I). P[1] is the reconstructed image. The difference between the original image and encoded image is the residual, R[1], which represents the error in the compression. Left: Original image, I = R[0]. Center: Reconstructed image, P[1]. Right: the residual, R[1], which represents the error introduced by compression.On the second pass through the network, R[1] is given as the network’s input (see figure below). A higher quality image P[2] is then created. So how does the system recreate such a good image (P[2], center panel below) from the residual R[1]? Because the model uses recurrent nodes with memory, the network saves information from each iteration that it can use in the next one. It learned something about the original image in Iteration[1] that is used along with R[1] to generate a better P[2] from B[2]. Lastly, a new residual, R[2] (right), is generated by subtracting P[2] from the original image. This time the residual is smaller since there are fewer differences between the reconstruct[...]

Announcing YouTube-8M: A Large and Diverse Labeled Video Dataset for Video Understanding Research


Posted by Sudheendra Vijayanarasimhan and Paul Natsev, Software EngineersMany recent breakthroughs in machine learning and machine perception have come from the availability of large labeled datasets, such as ImageNet, which has millions of images labeled with thousands of classes. Their availability has significantly accelerated research in image understanding, for example on detecting and classifying objects in static images.Video analysis provides even more information for detecting and recognizing objects, and understanding human actions and interactions with the world. Improving video understanding can lead to better video search and discovery, similarly to how image understanding helped re-imagine the photos experience. However, one of the key bottlenecks for further advancements in this area has been the lack of real-world video datasets with the same scale and diversity as image datasets. Today, we are excited to announce the release of YouTube-8M, a dataset of 8 million YouTube video URLs (representing over 500,000 hours of video), along with video-level labels from a diverse set of 4800 Knowledge Graph entities. This represents a significant increase in scale and diversity compared to existing video datasets. For example, Sports-1M, the largest existing labeled video dataset we are aware of, has around 1 million YouTube videos and 500 sports-specific classes--YouTube-8M represents nearly an order of magnitude increase in both number of videos and classes.In order to construct a labeled video dataset of this scale, we needed to address two key challenges: (1) video is much more time-consuming to annotate manually than images, and (2) video is very computationally expensive to process and store. To overcome (1), we turned to YouTube and its video annotation system, which identifies relevant Knowledge Graph topics for all public YouTube videos. While these annotations are machine-generated, they incorporate powerful user engagement signals from millions of users as well as video metadata and content analysis. As a result, the quality of these annotations is sufficiently high to be useful for video understanding research and benchmarking purposes. height="400px" src="" width="100%"> To ensure the stability and quality of the labeled video dataset, we used only public videos with more than 1000 views, and we constructed a diverse vocabulary of entities, which are visually observable and sufficiently frequent. The vocabulary construction was a combination of frequency analysis, automated filtering, verification by human raters that the entities are visually observable, and grouping into 24 top-level verticals (more details in our technical report). The figures below depict the dataset browser and the distribution of videos along the top-level verticals, and illustrate the dataset’s scale and diversity.A dataset explorer allows browsing and searching the full vocabulary of Knowledge Graph entities, grouped in 24 top-level verticals, along with corresponding videos. This screenshot depicts a subset of dataset videos annotated with the entity “Guitar”.The distribution of videos in the top-level verticals illustrates the scope and diversity of the dataset and reflects the natural distribution of popular YouTube videos.To address (2), we had to overcome the storage and computational resource bottlenecks that researchers face when working with videos. Pursuing video understanding at YouTube-8M’s scale would normally require a petabyte of video storage and dozens of CPU-years worth of processing. To make the dataset useful to researchers and students with limited computational resources, we pre-processed the videos and extracted frame-level features using a state-of-the-art deep learning model--the publicly available Inception-V3 image annotation model trained on ImageNet. These features are extracted at 1 frame-per-second temporal resolution, from 1.9 billion video frames, and are further compressed to fit on a single commodity hard[...]

A Neural Network for Machine Translation, at Production Scale


Posted by Quoc V. Le & Mike Schuster, Research Scientists, Google Brain TeamTen years ago, we announced the launch of Google Translate, together with the use of Phrase-Based Machine Translation as the key algorithm behind this service. Since then, rapid advances in machine intelligence have improved our speech recognition and image recognition capabilities, but improving machine translation remains a challenging goal.Today we announce the Google Neural Machine Translation system (GNMT), which utilizes state-of-the-art training techniques to achieve the largest improvements to date for machine translation quality. Our full research results are described in a new technical report we are releasing today: “Google’s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation” [1]. A few years ago we started using Recurrent Neural Networks (RNNs) to directly learn the mapping between an input sequence (e.g. a sentence in one language) to an output sequence (that same sentence in another language) [2]. Whereas Phrase-Based Machine Translation (PBMT) breaks an input sentence into words and phrases to be translated largely independently, Neural Machine Translation (NMT) considers the entire input sentence as a unit for translation.The advantage of this approach is that it requires fewer engineering design choices than previous Phrase-Based translation systems. When it first came out, NMT showed equivalent accuracy with existing Phrase-Based translation systems on modest-sized public benchmark data sets.Since then, researchers have proposed many techniques to improve NMT, including work on handling rare words by mimicking an external alignment model [3], using attention to align input words and output words [4] and breaking words into smaller units to cope with rare words [5,6]. Despite these improvements, NMT wasn't fast or accurate enough to be used in a production system, such as Google Translate. Our new paper [1] describes how we overcame the many challenges to make NMT work on very large data sets and built a system that is sufficiently fast and accurate enough to provide better translations for Google’s users and services.Data from side-by-side evaluations, where human raters compare the quality of translations for a given source sentence. Scores range from 0 to 6, with 0 meaning “completely nonsense translation”, and 6 meaning “perfect translation."The following visualization shows the progression of GNMT as it translates a Chinese sentence to English. First, the network encodes the Chinese words as a list of vectors, where each vector represents the meaning of all words read so far (“Encoder”). Once the entire sentence is read, the decoder begins, generating the English sentence one word at a time (“Decoder”). To generate the translated word at each step, the decoder pays attention to a weighted distribution over the encoded Chinese vectors most relevant to generate the English word (“Attention”; the blue link transparency represents how much the decoder pays attention to an encoded word).Using human-rated side-by-side comparison as a metric, the GNMT system produces translations that are vastly improved compared to the previous phrase-based production system. GNMT reduces translation errors by more than 55%-85% on several major language pairs measured on sampled sentences from Wikipedia and news websites with the help of bilingual human raters.An example of a translation produced by our system for an input sentence sampled from a news site. Go here for more examples of translations for input sentences sampled randomly from news sites and books.In addition to releasing this research paper today, we are announcing the launch of GNMT in production on a notoriously difficult language pair: Chinese to English. The Google Translate mobile and web apps are now using GNMT for 100% of machine translations from Chinese to English—about 18 million translations per day. The production deployment of GNMT was made possi[...]

Show and Tell: image captioning open sourced in TensorFlow


Posted by Chris Shallue, Software Engineer, Google Brain TeamIn 2014, research scientists on the Google Brain team trained a machine learning system to automatically produce captions that accurately describe images. Further development of that system led to its success in the Microsoft COCO 2015 image captioning challenge, a competition to compare the best algorithms for computing accurate image captions, where it tied for first place.Today, we’re making the latest version of our image captioning system available as an open source model in TensorFlow. This release contains significant improvements to the computer vision component of the captioning system, is much faster to train, and produces more detailed and accurate descriptions compared to the original system. These improvements are outlined and analyzed in the paper Show and Tell: Lessons learned from the 2015 MSCOCO Image Captioning Challenge, published in IEEE Transactions on Pattern Analysis and Machine Intelligence.Automatically captioned by our system.So what’s new?Our 2014 system used the Inception V1 image classification model to initialize the image encoder, which produces the encodings that are useful for recognizing different objects in the images. This was the best image model available at the time, achieving 89.6% top-5 accuracy on the benchmark ImageNet 2012 image classification task. We replaced this in 2015 with the newer Inception V2 image classification model, which achieves 91.8% accuracy on the same task. The improved vision component gave our captioning system an accuracy boost of 2 points in the BLEU-4 metric (which is commonly used in machine translation to evaluate the quality of generated sentences) and was an important factor of its success in the captioning challenge.Today’s code release initializes the image encoder using the Inception V3 model, which achieves 93.9% accuracy on the ImageNet classification task. Initializing the image encoder with a better vision model gives the image captioning system a better ability to recognize different objects in the images, allowing it to generate more detailed and accurate descriptions. This gives an additional 2 points of improvement in the BLEU-4 metric over the system used in the captioning challenge.Another key improvement to the vision component comes from fine-tuning the image model. This step addresses the problem that the image encoder is initialized by a model trained to classify objects in images, whereas the goal of the captioning system is to describe the objects in images using the encodings produced by the image model. For example, an image classification model will tell you that a dog, grass and a frisbee are in the image, but a natural description should also tell you the color of the grass and how the dog relates to the frisbee.In the fine-tuning phase, the captioning system is improved by jointly training its vision and language components on human generated captions. This allows the captioning system to transfer information from the image that is specifically useful for generating descriptive captions, but which was not necessary for classifying objects. In particular, after fine-tuning it becomes better at correctly describing the colors of objects. Importantly, the fine-tuning phase must occur after the language component has already learned to generate captions - otherwise, the noisiness of the randomly initialized language component causes irreversible corruption to the vision component. For more details, read the full paper here.Left: the better image model allows the captioning model to generate more detailed and accurate descriptions. Right: after fine-tuning the image model, the image captioning system is more likely to describe the colors of objects correctly.Until recently our image captioning system was implemented in the DistBelief software framework. The TensorFlow implementation released today achieves the same level of accuracy with significantly faster performance: time per[...]

The 280-Year-Old Algorithm Inside Google Trips


Posted by Bogdan Arsintescu, Software Engineer & Sreenivas Gollapudi, Kostas Kollias, Tamas Sarlos and Andrew Tomkins, Research ScientistsAlgorithms Engineering is a lot of fun because algorithms do not go out of fashion: one never knows when an oldie-but-goodie might come in handy. Case in point: Yesterday, Google announced Google Trips, a new app to assist you in your travels by helping you create your own “perfect day” in a city. Surprisingly, deep inside Google Trips, there is an algorithm that was invented 280 years ago. In 1736, Leonhard Euler authored a brief but beautiful mathematical paper regarding the town of Königsberg and its 7 bridges, shown here:Image from WikipediaIn the paper, Euler studied the following question: is it possible to walk through the city crossing each bridge exactly once? As it turns out, for the city of Königsberg, the answer is no. To reach this answer, Euler developed a general approach to represent any layout of landmasses and bridges in terms of what he dubbed the Geometriam Situs (the “Geometry of Place”), which we now call Graph Theory. He represented each landmass as a “node” in the graph, and each bridge as an “edge,” like this:Image from WikipediaEuler noticed that if all the nodes in the graph have an even number of edges (such graphs are called “Eulerian” in his honor) then, and only then, a cycle can be found that visits every edge exactly once. Keep this in mind, as we’ll rely on this fact later in the post.Our team in Google Research has been fascinated by the “Geometry of Place” for some time, and we started investigating a question related to Euler’s: rather than visiting just the bridges, how can we visit as many interesting places as possible during a particular trip? We call this the “itineraries” problem. Euler didn’t study it, but it is a well known topic in Optimization, where it is often called the “Orienteering” problem.While Euler’s problem has an efficient and exact solution, the itineraries problem is not just hard to solve, it is hard to even approximately solve! The difficulty lies in the interplay between two conflicting goals: first, we should pick great places to visit, but second, we should pick them to allow a good itinerary: not too much travel time; don’t visit places when they’re closed; don’t visit too many museums, etc. Embedded in such problems is the challenge of finding efficient routes, often referred to as the Travelling Salesman Problem (TSP).Algorithms for Travel ItinerariesFortunately, the real world has a property called the “triangle inequality” that says adding an extra stop to a route never makes it shorter. When the underlying geometry satisfies the triangle inequality, the TSP can be approximately solved using another algorithm discovered by Christofides in 1976. This is an important part of our solution, and builds on Euler’s paper, so we’ll give a quick four-step rundown of how it works here:We start with all our destinations separate, and repeatedly connect together the closest two that aren’t yet connected. This doesn’t yet give us an itinerary, but it does connect all the destinations via a minimum spanning tree of the graph.We take all the destinations that have an odd number of connections in this tree (Euler proved there must be an even number of these), and carefully pair them up.Because all the destinations now have an even number of edges, we’ve created an Eulerian graph, so we create a route that crosses each edge exactly once.We now have a great route, but it might visit some places more than once. No problem, we find any double visits and simply bypass them, going directly from the predecessor to the successor.Christofides gave an elegant proof that the resulting route is always close to the shortest possible. Here’s an example of the Christofides’ algorithm in action on a location graph with the nodes representing places and th[...]

The 2016 Google Earth Engine User Summit: Turning pixels into insights


Posted by Chris Herwig, Program Manager, Google Earth Engine"We are trying new methods [of flood modeling] in Earth Engine based on machine learning techniques which we think are cheaper, more scalable, and could exponentially drive down the cost of flood mapping and make it accessible to everyone."-Beth Tellman, Arizona State University and Cloud to Street Recently, Google headquarters hosted the Google Earth Engine User Summit 2016, a three-day hands-on technical workshop for scientists and students interested in using Google Earth Engine for planetary-scale cloud-based geospatial analysis. Earth Engine combines a multi-petabyte catalog of satellite imagery and geospatial datasets with a simple, yet powerful API backed by Google's cloud, which scientists and researchers use to detect, measure, and predict changes to the Earth's surface. Earth Engine founder Rebecca Moore kicking off the first day of the summitSummit attendees could choose among twenty-five hands-on workshops over the course of the three day summit, most generated for the summit specifically, giving attendees an exclusive introduction to the latest features in our platform. The sessions covered a wide range of topics and Earth Engine experience levels, from image classifiers and classifications, time series analysis, building custom web applications, all the way to arrays, matrices, and linear algebra in Earth Engine. Terra Bella Product Manager, Kristi Bohl, taught a session on using SkySat imagery, like the image above over Sydney, Australia, for change detection. Workshop attendees also learned how to take advantage of the deep temporal stack the SkySat archive offers for change-over-time analyses.Cross-correlation between Landsat 8 NDVI and the sum of CHIRPS precipitation. Red is high cross-correlation and blue is low. The gap in data is because CHIRPS is masked over water.Nick Clinton, a developer advocate for Earth Engine, taught a time series session that covered statistical techniques as applied to satellite imagery data. Students learned how to make graphics like the above, which shows the cross-correlation between Landsat 8 NDVI and the sum of CHIRPS precipitation from the previous month over San Francisco, CA. The correlation should be high for relatively r-selected plants like grasses and weeds and relatively low for perennials, shrubs, or forest.My workshop session covered how users can upload their own data into Earth Engine and the many different ways to take the results of their analyses with them, including rendering static map tiles hosted on Google Cloud Storage, exporting images, creating new assets, and even making movies, like this timelapse video of all the Sentinel 2A images captured over Sydney Australia.Along with the workshop sessions, we hosted five plenary speakers and 18 lightning talk presenters. These presenters shared how Earth Engine fits into their research, spanning from drought monitoring, agriculture, conservation, flood risk mapping, and hydrological analysis. Plenary SpeakersAgriculture in the Sentinel era: scaling up with Earth Engine, Guido Lemoine, European Commission's Joint Research CentreFlood Vulnerability from the Cloud to the Street (and back!) powered by Google Earth Engine, Beth Tellman, Arizona State University and Cloud to StreetAccelerating Rangeland Conservation, Brady Allred, University of MontanaMonitoring Drought with Google Earth Engine: From Archives to Answers, Justin Huntington, Desert Research Institute / Western Regional Climate CenterAutomated methods for surface water detection, Gennadii Donchytes, DeltaresLightning PresentationsMapping the Behavior of Rivers, Alex Bryk, University of California, BerkeleyClimate Data for Crisis and Health Applications, Pietro Ceccato, Columbia UniversityAppalachian Communities at Risk, Matt Wasson, Jeff Deal, Appalachian VoicesWater, Wildlife and Working Lands, Patrick Donnelly, U.S. Fish and Wildlife ServiceStream-side [...]

Research from VLDB 2016: Improved Friend Suggestion using Ego-Net Analysis


Posted by Alessandro Epasto, Research Scientist, Google Research NYOn September 5 - 9, New Delhi, India hosted the 42nd International Conference on Very Large Data Bases (VLDB), a premier annual forum for academic and industry research on databases, data management, data mining and data analytics. Over the past several years, Google has actively participated in VLDB, both as official sponsor and with numerous contributions to the research and industrial tracks. In this post, we would like to share the research presented in one of the Google papers from VLDB 2016. In Ego-net Community Mining Applied to Friend Suggestion, co-authored by Googlers Silvio Lattanzi, Vahab Mirrokni, Ismail Oner Sebe, Ahmed Taei, Sunita Verma and myself, we explore how social networks can provide better friend suggestions to users, a challenging practical problem faced by all social network platformsFriend suggestion – the task of suggesting to a user the contacts she might already know in the network but that she hasn’t added yet – is major driver of user engagement and social connection in all online social networks. Designing a high quality system that can provide relevant and useful friend recommendations is very challenging, and requires state-of-the-art machine learning algorithms based on a multitude of parameters. An effective family of features for friend suggestion consist of graph features such as the number of common friends between two users. While widely used, the number of common friends has some major drawbacks, including the following which is shown in Figure 1.Figure 1: Ego-net of Sally.In this figure we represent the social connections of Sally and her friends – the ego-net of Sally. An ego-net of a node (in this case, Sally) is defined as the graph that contains the node itself, all of the node’s neighbors and the connection among those nodes. Sally has 6 friends in her ego-net: Albert (her husband), Brian (her son), Charlotte (her mother) as well as Uma (her boss), Vincent and Wally (two of her team members). Notice how A, B and C are all connected with each other while they do not know U, V or W. On the other hand U, V and W have all added each other as their friend (except U and W who are good friend but somehow forgot to add each other).Notice how each of A, B, C have a common friend with each of U, V and W: Sally herself. A friend recommendation system based on common neighbors might suggest to Sally’s son (for instance) to add Sally’s boss as his friend! In reality the situation is even more complicated because users’ online and offline friends span several different social circles or communities (family, work, school, sports, etc). In our paper we introduce a novel technique for friend suggestions based on independently analyzing the ego-net structure. The main contribution of the paper is to show that it is possible to provide friend suggestions efficiently by constructing all ego-nets of the nodes in the graph and then independently applying community detection algorithms on them in large-scale distributed systems. Specifically, the algorithm proceeds by constructing the ego-nets of all nodes and applying, independently on each of them, a community detection algorithm. More precisely the algorithm operates on so-called “ego-net-minus-ego” graphs, which is defined as the graph including only the neighbors of a given node, as shown in the figure below.Figure 2: Clustering of the ego-net of Sally.Notice how in this example the ego-net-minus-ego of Sally has two very clear communities: her family (A, B, C) and her co-workers (U, V, W) which are easily separated. Intuitively, this is because one might expect that while nodes (e.g. Sally) participate in many communities, there is usually a single (or a limited number of) contexts in which two specific neighbors interact. While Sally is both part of her family and work community, Sally and [...]

Computational Thinking from a Dispositions Perspective


Posted by Chris Stephenson, Head of Computer Science Education Programs at Google, and Joyce Malyn-Smith, Managing Project Director at Education Development Center (EDC) (Cross-posted on the Google for Education Blog)In K–12 computer science (CS) education, much of the discussion about what students need to learn and do to has centered around computational thinking (CT). While much of the current work in CT education is focused on core concepts and their application, the one area of CT that has not been well explored is the relationship between CT as a problem solving model, and the dispositions or habits of mind that it can build in students of all ages. Exploring the mindset that CT education can engender depends, in part, on the definition of CT itself. While there are a number of definitions of CT in circulation, Valerie Barr and I defined it in the following way:CT is an approach to solving problems in a way that can be implemented with a computer. Students become not merely tool users but tool builders. They use a set of concepts, such as abstraction, recursion, and iteration, to process and analyze data, and to create real and virtual artifacts. CT is a problem solving methodology that can be automated and transferred and applied across subjects.Like many others, our view of CT also included the core CT concepts: abstraction, algorithms and procedures, automation, data collection and analysis, data representation, modeling and simulation, parallelization and problem decomposition.The idea of dispositions, however, comes from the field of vocational education and research on career development which focuses on the personal qualities or soft skills needed for employment (see full report from Economist Intelligence Unit here). These skills traditionally include being responsible, adaptable, flexible, self-directed, and self-motivated; being able to solve simple and complex problems, having integrity, self-confidence, and self-control. They can also include the ability to work with people of different ages and cultures, collaboration, complex communication and expert thinking. Cuoco, Goldenberg, and Mark’s research also provided examples of what students should learn to develop the habits of mind used by scientists across numerous disciplines. These are: recognizing patterns, experimenting, describing, tinkering, inventing, visualizing, and conjecturing. Potter and Vickers also found that in the burgeoning field of cyber security “there is significant overlap between the roles for many soft skills, including analysis, consulting and process skills, leadership, and relationship management. Both communication and presentation skills were valued.”CT, because of its emphasis on problem solving, provides a natural environment for embedding the idea of dispositions into K-12. According to the International Society for Technology in Education and the Computer Science Teachers Association, the set of dispositions that student practice and internalize while learning about CT can include:confidence in dealing with complexity,persistence in working with difficult problems,the ability to handle ambiguity,the ability to deal with open-ended problems,setting aside differences to work with others to achieve a common goal or solution, andknowing one's strengths and weaknesses when working with others.Any teacher in any discipline is likely to tell you that persistence, problem solving, collaboration and awareness of one’s strengths and limitations are critical to successful learning for all students. So how do we make these dispositions a more explicit part of the CT curriculum? One of the ways to do so is to to call them out directly to students and explain why they are important in all areas of their study, career, and lives. In addition educators can:Post in the classroom­­ a list of the Dispositions Leading to Success,Help fa[...]

Announcing the First Annual Global PhD Fellowship Summit and the 2016 Google PhD Fellows


Posted by Michael Rennaker, Program Manager, University RelationsIn 2009, Google created the PhD Fellowship Program to recognize and support outstanding graduate students doing exceptional research in Computer Science and related disciplines. Now in its eighth year, our Fellowships have helped support over 250 graduate students in Australia, China and East Asia, India, North America, Europe and the Middle East who seek to shape and influence the future of technology.Recently, Google PhD Fellows from around the globe converged on our Mountain View campus for the first annual Global PhD Fellowship Summit. The students heard talks from researchers like Jeff Dean, Françoise Beaufays, Peter Norvig, Maya Gupta and Amin Vahdat, and got a glimpse into some of the state-of-the-art research pursued across Google. Senior Google Fellow Jeff Dean shares how TensorFlow is used at GoogleFellows also had the chance to connect one-on-one with Googlers to discuss their research, as well as receive feedback from leaders in their fields. The event wrapped up with a panel discussion with Dan Russell, Kristen LeFevre, Douglas Eck and Françoise Beaufays about their unique career paths. Maggie Johnson concluded the Summit by sharing about the different types of research environments across academia and industry.(Left) PhD Fellows share their work with Google researchers during the poster session(Right) Research panelists share their journeys through academia and industryOur PhD Fellows represent some the best and brightest young researchers around the globe in Computer Science and it is our ongoing goal to support them as they make their mark on the world.We’d also like to welcome the newest class of Google PhD Fellows recently awarded in China and East Asia, India, and Australia. We look forward to seeing each of them at next year’s summit!2016 Global PhD FellowsComputational NeuroscienceCameron (Po-Hsuan) Chen, Princeton UniversityGrace Lindsay, Columbia UniversityMartino Sorbaro Sindaci, The University of EdinburghHuman-Computer InteractionDana McKay, University of MelbourneKoki Nagano, University of Southern CaliforniaArvind Satyanarayan, Stanford UniversityAmy Xian Zhang, Massachusetts Institute of TechnologyMachine LearningOlivier Bachem, Swiss Federal Institute of Technology ZurichTianqi Chen, University of WashingtonEmily Denton, New York UniversityKwan Hui Lim, University of MelbourneYves-Laurent Kom Samo, University of OxfordWoosang Lim, Korea Advanced Institute of Science and TechnologyAnirban Santara, Indian Institute of Technology KharagpurDaniel Jaymin Mankowitz, Technion - Israel Institute of TechnologyLucas Maystre, École Polytechnique Fédérale de LausanneArvind Neelakantan, University of Massachusetts, AmherstLudwig Schmidt, Massachusetts Institute of TechnologyQuanming Yao, The Hong Kong University of Science and TechnologyShandian Zhe, Purdue University, West LafayetteMachine Perception, Speech Technology and Computer VisionEugen Beck, RWTH Aachen UniversityYu-Wei Chao, University of Michigan, Ann ArborWei Liu, University of North Carolina at Chapel HillAron Monszpart, University College LondonThomas Schoeps, Swiss Federal Institute of Technology ZurichTian Tan, Shanghai Jiao Tong UniversityChia-Yin Tsai, Carnegie Mellon UniversityWeitao Xu, University of QueenslandMarket AlgorithmsHossein Esfandiari, University of Maryland, College ParkSandy Heydrich, Saarland University - Saarbrucken GSCSRad Niazadeh, Cornell UniversitySadra Yazdanbod, Georgia Institute of TechnologyMobile ComputingLei Kang, University of WisconsinTauhidur Rahman, Cornell UniversityChungkuk Yoo, Korea Advanced Institute of Science and TechnologyYuhao Zhu, University of Texas, AustinNatural Language ProcessingTamer Alkhouli, RWTH Aachen UniversityJose Camacho Collados, Sapienza - Università di RomaPrivacy and SecurityChitra Javali,[...]

Reproducible Science: Cancer Researchers Embrace Containers in the Cloud


Posted by Dr. Kyle Ellrott, Oregon Health and Sciences University, Dr. Josh Stuart, University of California Santa Cruz, and Dr. Paul Boutros, Ontario Institute for Cancer ResearchToday we hear from the principal investigators of the ICGC-TCGA DREAM Somatic Mutation Calling Challenges about how they are encouraging cancer researchers to make use of Docker and Google Cloud Platform to gain a deeper understanding of the complex genetic mutations that occur in cancer, while doing so in a reproducible way.– Nicole Deflaux and Jonathan Bingham, Google GenomicsToday’s genomic analysis software tools often give different answers when run in different computing environments - that’s like getting a different diagnosis from your doctor depending on which examination room you’re sitting in. Reproducible science matters, especially in cancer research where so many lives are at stake. The Cancer Moonshot has called for the research world to 'Break down silos and bring all the cancer fighters together'. Portable software “containers” and cloud computing hold the potential to help achieve these goals by making scientific data analysis more reproducible, reusable and scalable. Our team of researchers from the Ontario Institute for Cancer Research, University of California Santa Cruz, Sage Bionetworks and Oregon Health and Sciences University is pushing the frontiers by encouraging scientists to package up their software in reusable Docker containers and make use of cloud-resident data from the Cancer Cloud Pilots funded by the National Cancer Institute.In 2014 we initiated the ICGC-TCGA DREAM Somatic Mutation Calling (SMC) Challenges where Google provided credits on Google Cloud Platform. The first result of this collaboration was the DREAM-SMC DNA challenge, a public challenge that engaged cancer researchers from around the world to find the best methods for discovering DNA somatic mutations. By the end of the challenge, over 400 registered participants competed by submitting 3,500 open-source entries for 14 test genomes, providing key insights on the strengths and limitations of the current mutation detection methods.The SMC-DNA challenge enabled comparison of results, but it did little to facilitate the exchange of cross-platform software tools. Accessing extremely large genome sequence input files and shepherding complex software pipelines created a “double whammy” to discourage data sharing and software reuse.How can we overcome these barriers?Exciting developments have taken place in the past couple of years that may annihilate these last barriers. The availability of cloud technologies and containerization can serve as the vanguards of reproducibility and interoperability.Thus, a new way of creating open DREAM challenges has emerged: rather than encouraging the status quo where participants run their own methods themselves on their own systems, and the results cannot be verified, the new challenge design requires participants to submit open-source code packaged in Docker containers so that anyone can run their methods and verify the results. Real-time leaderboards show which entries are winning and top performers have a chance to claim a prize. Working with Google Genomics and Google Cloud Platform, the DREAM-SMC organizers are now using cloud and containerization technologies to enable portability and reproducibility as a core part of the DREAM challenges. The latest SMC installments, the SMC-Het Challenge and the SMC-RNA Challenge have implemented this new plan:SMC-Het Challenge: Tumour biopsies are composed of many different cell types in addition to tumour cells, including normal tissue and infiltrating immune cells. Furthermore, the tumours themselves are made of a mixture of different subpopulations, all related to one another through cell division and mutation. Cr[...]

Improving Inception and Image Classification in TensorFlow


Posted by Alex Alemi, Software Engineer Earlier this week, we announced the latest release of the TF-Slim library for TensorFlow, a lightweight package for defining, training and evaluating models, as well as checkpoints and model definitions for several competitive networks in the field of image classification. In order to spur even further progress in the field, today we are happy to announce the release of Inception-ResNet-v2, a convolutional neural network (CNN) that achieves a new state of the art in terms of accuracy on the ILSVRC image classification benchmark. Inception-ResNet-v2 is a variation of our earlier Inception V3 model which borrows some ideas from Microsoft's ResNet papers [1][2]. The full details of the model are in our arXiv preprint Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning.Residual connections allow shortcuts in the model and have allowed researchers to successfully train even deeper neural networks, which have lead to even better performance. This has also enabled significant simplification of the Inception blocks. Just compare the model architectures in the figures below:Schematic diagram of Inception V3Schematic diagram of Inception-ResNet-v2At the top of the second Inception-ResNet-v2 figure, you'll see the full network expanded. Notice that this network is considerably deeper than the previous Inception V3. Below in the main figure is an easier to read version of the same network where the repeated residual blocks have been compressed. Here, notice that the inception blocks have been simplified, containing fewer parallel towers than the previous Inception V3.The Inception-ResNet-v2 architecture is more accurate than previous state of the art models, as shown in the table below, which reports the Top-1 and Top-5 validation accuracies on the ILSVRC 2012 image classification benchmark based on a single crop of the image. Furthermore, this new model only requires roughly twice the memory and computation compared to Inception V3. Model Architecture Checkpoint Top-1 Accuracy Top-5 Accuracy Inception-ResNet-v2 Code inception_resnet_v2_2016_08_30.tar.gz 80.4 95.3 Inception V3 Code inception_v3_2016_08_28.tar.gz 78.0 93.9 ResNet 152 Code resnet_v1_152_2016_08_28.tar.gz 76.8 93.2 ResNet V2 200 Code TBA 79.9* 95.2* (*): Results quoted in ResNet paper.As an example, while both Inception V3 and Inception-ResNet-v2 models excel at identifying individual dog breeds, the new model does noticeably better. For instance, whereas the old model mistakenly reported Alaskan Malamute for the picture on the right, the new Inception-ResNet-v2 model correctly identifies the dog breeds in both images.An Alaskan Malamute (left) and a Siberian Husky (right). Images from WikipediaIn order to allow people to immediately begin experimenting, we are also releasing a pre-trained instance of the new Inception-ResNet-v2, as part of the TF-Slim Image Model Library.We are excited to see what the community does with this improved model, following along as people adapt it and compare its performance on various tasks. Want to get started? See the accompanying instructions on how to train, evaluate or fine-tune a network.As always, releasing the code was a team effort. Specific thanks are due to:Model Architecture - Christian Szegedy, Sergey Ioffe, Vincent Vanhoucke, Alex AlemiSystems Infrastructure - Jon Shlens, Benoit Steiner, Mark Sandler, and David AndersenTensorFlow-Slim - Sergio Guadarrama and Nathan SilbermanModel Visualization - Fernanda Viégas and James Wexler [...]

TF-Slim: A high level library to define complex models in TensorFlow


Posted by Nathan Silberman and Sergio Guadarrama, Google Research Earlier this year, we released a TensorFlow implementation of a state-of-the-art image classification model known as Inception-V3. This code allowed users to train the model on the ImageNet classification dataset via synchronized gradient descent, using either a single local machine or a cluster of machines. The Inception-V3 model was built on an experimental TensorFlow library called TF-Slim, a lightweight package for defining, training and evaluating models in TensorFlow. The TF-Slim library provides common abstractions which enable users to define models quickly and concisely, while keeping the model architecture transparent and its hyperparameters explicit.Since that release, TF-Slim has grown substantially, with many types of layers, loss functions, and evaluation metrics added, along with handy routines for training and evaluating models. These routines take care of all the details you need to worry about when working at scale, such as reading data in parallel, deploying models on multiple machines, and more. Additionally, we have created the TF-Slim Image Models library, which provides definitions and training scripts for many widely used image classification models, using standard datasets. TF-Slim and its components are already widely used within Google, and many of these improvements have already been integrated into tf.contrib.slim.Today, we are proud to share the latest release of TF-Slim with the TF community. Some highlights of this release include:Many new kinds of layers (such as Atrous Convolution and Deconvolution) enabling a much richer family of neural network architectures.Support for more loss functions and evaluation metrics (e.g., mAP, IoU).A deployment library to make it easier to perform synchronous or asynchronous training using multiple GPUs/CPUs, on the same machine or on multiple machines.Code to define and train many widely used image classification models (e.g., Inception[1][2][3], VGG[4], AlexNet[5], ResNet[6]).Pre-trained model weights for the above image classification models. These models have been trained on the ImageNet classification dataset, but can be used for many other computer vision tasks. As a simple example, we provide code to fine-tune these classifiers to a new set of output labels.Tools to easily process standard image datasets, such as ImageNet, CIFAR10 and MNIST.Want to get started using TF-Slim? See the README for details. Interested in working with image classification models? See these instructions or this Jupyter notebook.The release of the TF-Slim library and the pre-trained model zoo has been the result of widespread collaboration within Google Research. In particular we want to highlight the vital contributions of the following researchers:TF-Slim: Sergio Guadarrama, Nathan Silberman.Model Definitions and Checkpoints: Christian Szegedy, Sergey Ioffe, Vincent Vanhoucke, Jon Shlens, Zbigniew Wojna, Vivek Rathod, George Papandreou, Alex AlemiSystems Infrastructure: Jon Shlens, Matthieu Devin, Martin WickeJupyter notebook: Nathan Silberman, Kevin MurphyReferences:[1] Going deeper with convolutions, Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, Andrew Rabinovich, CVPR 2015[2] Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift Sergey Ioffe, Christian Szegedy, ICML 2015[3] Rethinking the Inception Architecture for Computer Vision, Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jonathon Shlens, Zbigniew Wojna, arXiv technical report 2015[4] Very Deep Convolutional Networks for Large-Scale Image Recognition, Karen Simonyan, Andrew Zisserman, ICLR 2015[5] ImageNet Classification with Deep[...]

Text summarization with TensorFlow


Posted by Peter Liu and Xin Pan, Software Engineers, Google Brain TeamEvery day, people rely on a wide variety of sources to stay informed -- from news stories to social media posts to search results. Being able to develop Machine Learning models that can automatically deliver accurate summaries of longer text can be useful for digesting such large amounts of information in a compressed form, and is a long-term goal of the Google Brain team. Summarization can also serve as an interesting reading comprehension test for machines. To summarize well, machine learning models need to be able to comprehend documents and distill the important information, tasks which are highly challenging for computers, especially as the length of a document increases.In an effort to push this research forward, we’re open-sourcing TensorFlow model code for the task of generating news headlines on Annotated English Gigaword, a dataset often used in summarization research. We also specify the hyper-parameters in the documentation that achieve better than published state-of-the-art on the most commonly used metric as of the time of writing. Below we also provide samples generated by the model. Extractive and Abstractive summarizationOne approach to summarization is to extract parts of the document that are deemed interesting by some metric (for example, inverse-document frequency) and join them to form a summary. Algorithms of this flavor are called extractive summarization.Original Text: Alice and Bob took the train to visit the zoo. They saw a baby giraffe, a lion, and a flock of colorful tropical birds. Extractive Summary: Alice and Bob visit the zoo. saw a flock of birds.Above we extract the words bolded in the original text and concatenate them to form a summary. As we can see, sometimes the extractive constraint can make the summary awkward or grammatically strange. Another approach is to simply summarize as humans do, which is to not impose the extractive constraint and allow for rephrasings. This is called abstractive summarization.Abstractive summary: Alice and Bob visited the zoo and saw animals and birds.In this example, we used words not in the original text, maintaining more of the information in a similar amount of words. It’s clear we would prefer good abstractive summarizations, but how could an algorithm begin to do this?About the TensorFlow modelIt turns out for shorter texts, summarization can be learned end-to-end with a deep learning technique called sequence-to-sequence learning, similar to what makes Smart Reply for Inbox possible. In particular, we’re able to train such models to produce very good headlines for news articles. In this case, the model reads the article text and writes a suitable headline.To get an idea of what the model produces, you can take a look at some examples below. The first column shows the first sentence of a news article which is the model input, and the second column shows what headline the model has written.Input: Article 1st sentenceModel-written headlinemetro-goldwyn-mayer reported a third-quarter net loss of dlrs 16 million due mainly to the effect of accounting rules adopted this year mgm reports 16 million net loss on higher revenue starting from july 1, the island province of hainan in southern china will implement strict market access control on all incoming livestock and animal products to prevent the possible spread of epidemic diseases hainan to curb spread of diseasesaustralian wine exports hit a record 52.1 million liters worth 260 million dollars (143 million us) in september, the government statistics office reported on monday australian wine exports hit record high in septemberFuture ResearchWe’ve observed that due to the nature of news headlines,[...]

Meet Parsey’s Cousins: Syntax for 40 languages, plus new SyntaxNet capabilities


Posted by Chris Alberti, Dave Orr & Slav Petrov, Google Natural Language Understanding TeamJust in time for ACL 2016, we are pleased to announce that Parsey McParseface, released in May as part of SyntaxNet and the basis for the Cloud Natural Language API, now has 40 cousins! Parsey’s Cousins is a collection of pretrained syntactic models for 40 languages, capable of analyzing the native language of more than half of the world’s population at often unprecedented accuracy. To better address the linguistic phenomena occurring in these languages we have endowed SyntaxNet with new abilities for Text Segmentation and Morphological Analysis.When we released Parsey, we were already planning to expand to more languages, and it soon became clear that this was both urgent and important, because researchers were having trouble creating top notch SyntaxNet models for other languages.The reason for that is a little bit subtle. SyntaxNet, like other TensorFlow models, has a lot of knobs to turn, which affect accuracy and speed. These knobs are called hyperparameters, and control things like the learning rate and its decay, momentum, and random initialization. Because neural networks are more sensitive to the choice of these hyperparameters than many other machine learning algorithms, picking the right hyperparameter setting is very important. Unfortunately there is no tested and proven way of doing this and picking good hyperparameters is mostly an empirical science -- we try a bunch of settings and see what works best.An additional challenge is that training these models can take a long time, several days on very fast hardware. Our solution is to train many models in parallel via MapReduce, and when one looks promising, train a bunch more models with similar settings to fine-tune the results. This can really add up -- on average, we train more than 70 models per language. The plot below shows how the accuracy varies depending on the hyperparameters as training progresses. The best models are up to 4% absolute more accurate than ones trained without hyperparameter tuning.Held-out set accuracy for various English parsing models with different hyperparameters (each line corresponds to one training run with specific hyperparameters). In some cases training is a lot slower and in many cases a suboptimal choice of hyperparameters leads to significantly lower accuracy. We are releasing the best model that we were able to train for each language.In order to do a good job at analyzing the grammar of other languages, it was not sufficient to just fine-tune our English setup. We also had to expand the capabilities of SyntaxNet. The first extension is a model for text segmentation, which is the task of identifying word boundaries. In languages like English, this isn’t very hard -- you can mostly look for spaces and punctuation. In Chinese, however, this can be very challenging, because words are not separated by spaces. To correctly analyze dependencies between Chinese words, SyntaxNet needs to understand text segmentation -- and now it does.Analysis of a Chinese string into a parse tree showing dependency labels, word tokens, and parts of speech (read top to bottom for each word token).The second extension is a model for morphological analysis. Morphology is a language feature that is poorly represented in English. It describes inflection: i.e., how the grammatical function and meaning of the word changes as its spelling changes. In English, we add an -s to a word to indicate plurality. In Russian, a heavily inflected language, morphology can indicate number, gender, whether the word is the subject or object of a sentence, possessives, prepositional phrases, and more. To understand[...]

ACL 2016 & Research at Google


Posted by Slav Petrov, Research ScientistThis week, Berlin hosts the 2016 Annual Meeting of the Association for Computational Linguistics (ACL 2016), the premier conference of the field of computational linguistics, covering a broad spectrum of diverse research areas that are concerned with computational approaches to natural language. As a leader in Natural Language Processing (NLP) and a Platinum Sponsor of the conference, Google will be on hand to showcase research interests that include syntax, semantics, discourse, conversation, multilingual modeling, sentiment analysis, question answering, summarization, and generally building better learners using labeled and unlabeled data, state-of-the-art modeling, and learning from indirect supervision. Our systems are used in numerous ways across Google, impacting user experience in search, mobile, apps, ads, translate and more. Our work spans the range of traditional NLP tasks, with general-purpose syntax and semantic algorithms underpinning more specialized systems.Our researchers are experts in natural language processing and machine learning, and combine methodological research with applied science, and our engineers are equally involved in long-term research efforts and driving immediate applications of our technology. If you’re attending ACL 2016, we hope that you’ll stop by the booth to check out some demos, meet our researchers and discuss projects and opportunities at Google that go into solving interesting problems for billions of people. Learn more about Google research being presented at ACL 2016 below (Googlers highlighted in blue), and visit the Natural Language Understanding Team page at Transition-based Dependency Parsing via Control ParametersBernd Bohnet, Ryan McDonald, Emily Pitler, Ji MaLearning the Curriculum with Bayesian Optimization for Task-Specific Word Representation LearningYulia Tsvetkov, Manaal Faruqui, Wang Ling (Google DeepMind), Chris Dyer (Google DeepMind)Morpho-syntactic Lexicon Generation Using Graph-based Semi-supervised Learning (TACL)Manaal Faruqui, Ryan McDonald, Radu SoricutMany Languages, One Parser (TACL)Waleed Ammar, George Mulcaire, Miguel Ballesteros, Chris Dyer (Google DeepMind)*, Noah A. Smith Latent Predictor Networks for Code GenerationWang Ling (Google DeepMind), Phil Blunsom (Google DeepMind), Edward Grefenstette (Google DeepMind), Karl Moritz Hermann (Google DeepMind), Tomáš Kočiský (Google DeepMind), Fumin Wang (Google DeepMind), Andrew Senior (Google DeepMind) Collective Entity Resolution with Multi-Focal AttentionAmir Globerson, Nevena Lazic, Soumen Chakrabarti, Amarnag Subramanya, Michael Ringgaard, Fernando PereiraPlato: A Selective Context Model for Entity Resolution (TACL)Nevena Lazic, Amarnag Subramanya, Michael Ringgaard, Fernando Pereira WikiReading: A Novel Large-scale Language Understanding Task over WikipediaDaniel Hewlett, Alexandre Lacoste, Llion Jones, Illia Polosukhin, Andrew Fandrianto, Jay Han, Matthew Kelcey, David Berthelot Stack-propagation: Improved Representation Learning for SyntaxYuan Zhang, David WeissCross-lingual Models of Word Embeddings: An Empirical ComparisonShyam Upadhyay, Manaal Faruqui, Chris Dyer (Google DeepMind), Dan Roth Globally Normalized Transition-Based Neural Networks (Outstanding Papers Session)Daniel Andor, Chris Alberti, David Weiss, Aliaksei Severyn, Alessandro Presta, Kuzman Ganchev, Slav Petrov, Michael CollinsPostersCross-lingual projection for class-based language modelsBeat Gfeller, Vlad S[...]

Computational Thinking for All Students


Posted by Maggie Johnson, Director of Education and University Relations, Google(Crossposted on the Google for Education Blog, and the the Huffington Post)Last year, I wrote about the importance of teaching computational thinking to all K-12 students. Given the growing use of computing, algorithms and data in all fields from the humanities to medicine to business, it’s becoming increasingly important for students to understand the basics of computer science (CS). One lesson we have learned through Google’s CS education outreach efforts is that these skills can be accessible to all students, if we introduce them early in K-5. These are truly 21st century skills which can, over time, produce a workforce ready for a technology-enabled and driven economy. How can teachers start introducing computational thinking in early school curriculum? It is already present in many topic areas - algorithms for solving math problems, for example. However, what is often missing in current examples of computational thinking is the explicit connection between what students are learning and its application in computing. For example, once a student has mastered adding multi-digit numbers, the following algorithm could be presented:Add together the digits in the ones place. If the result is < 10, it becomes the ones digit of the answer. If it's >= 10 or greater, the ones digit of the result becomes the ones digit of the answer, and you add 1 to the next column.Add together the digits in the tens place, plus the 1 carried over from the ones place, if necessary. If the answer < than 10, it becomes the tens digit of the answer; if it's >= 10, the ones digit becomes the tens digit of the answer and 1 is added to the next column.Repeat this process for any additional columns until they are all added.This allows a teacher to present the concept of an algorithm and its use in computing, as well as the most important elements of any computer program: conditional branching (“if the result is less than 10…”) and iteration (“repeat this process…”). Going a step farther, a teacher translating the algorithm into a running program can have a compelling effect. When something that students have used to solve an instance of a problem can automatically solve all instances of the that problem, it’s quite a powerful moment for them even if they don’t do the coding themselves. Google has created an online course for K-12 teachers to learn about computational thinking and how to make these explicit connections for their students. We also have a large repository of lessons, explorations and programs to support teachers and students. Our videos illustrate real-world examples of the application of computational thinking in Google’s products and services, and we have compiled a set of great resources showing how to integrate computational thinking into existing curriculum. We also recently announced Project Bloks to engage younger children in computational thinking. Finally,, for whom Google is a primary sponsor, has curriculum and materials for K-5 teachers and students. We feel that computational thinking is a core skill for all students. If we can make these explicit connections for students, they will see how the devices and apps that they use everyday are powered by algorithms and programs. They will learn the importance of data in making decisions. They will learn skills that will prepare them for a workforce that will be doing vastly different tasks than the workforce of today. We owe it to all students to give them every possible opportunity to be productive and successful members of society. [...]

Announcing an Open Source ADC board for BeagleBone


Posted by Jason Holt, Software Engineer(Cross-posted on the Google Open Source Blog)Working with electronics, we often find ourselves soldering up a half baked electronic circuit to detect some sort of signal. For example, last year we wanted to measure the strength of a carrier. We started with traditional analog circuits — amplifier, filter, envelope detector, threshold. You can see some of our prototypes in the image below; they get pretty messy.While there's a certain satisfaction in taming a signal using the physical properties of capacitors, coils of wire and transistors, it's usually easier to digitize the signal with an Analog to Digital Converter (ADC) and manage it with Digital Signal Processing (DSP) instead of electronic parts. Tweaking software doesn't require a soldering iron, and lets us modify signals in ways that would require impossible analog circuits.There are several standard solutions for digitizing a signal: connect a laptop to an oscilloscope or Data Acquisition System (DAQ) via USB or Ethernet, or use the onboard ADCs of a maker board like an Arduino. The former are sensitive and accurate, but also big and power hungry. The latter are cheap and tiny, but slower and have enough RAM for only milliseconds worth of high speed sample data. That led us to investigate single board computers like the BeagleBone and Raspberry Pi, which are small and cheap like an Arduino, but have specs like a smartphone. And crucially, the BeagleBone's system-on-a-chip (SoC) combines a beefy ARMv7 CPU with two smaller Programmable Realtime Units (PRUs) that have access to all 512MB of system RAM. This lets us dedicate the PRUs to the time-sensitive and repetitive task of reading each sample out of an external ADC, while the main CPU lets us use the data with the GNU/Linux tools we're used to.The result is an open source BeagleBone cape we've named PRUDAQ. It's built around the Analog Devices AD9201 ADC, which samples two inputs simultaneously at up to 20 megasamples per second, per channel. Simultaneous sampling and high sample rates make it useful for software-defined radio (SDR) and scientific applications where a built-in ADC isn't quite up to the task. Our open source electrical design and sample code are available on GitHub, and GroupGets has boards ready to ship for $79. We also were fortunate to have help from Google intern Kumar Abhishek. He added support for PRUDAQ to his Google Summer of Code project BeagleLogic that performs much better than our sample code.We started PRUDAQ for our own needs, but quickly realized that others might also find it useful. We're excited to get your feedback through the email list. Tell us what can be done with inexpensive fast ADCs paired with inexpensive fast CPUs! [...]

Towards an exact (quantum) description of chemistry


Posted by Ryan Babbush, Quantum Software Engineer“...nature isn't classical, dammit, and if you want to make a simulation of nature, you'd better make it quantum mechanical...” - Richard Feynman, Simulating Physics with ComputersOne of the most promising applications of quantum computing is the ability to efficiently model quantum systems in nature that are considered intractable for classical computers. Now, in collaboration with the Aspuru-Guzik group at Harvard and researchers from Lawrence Berkeley National Labs, UC Santa Barbara, Tufts University and University College London, we have performed the first completely scalable quantum simulation of a molecule. Our experimental results are detailed in the paper Scalable Quantum Simulation of Molecular Energies, which recently appeared in Physical Review X.The goal of our experiment was to use quantum hardware to efficiently solve the molecular electronic structure problem, which seeks the solution for the lowest energy configuration of electrons in the presence of a given nuclear configuration. In order to predict chemical reaction rates (which govern the mechanism of chemical reactions), one must make these calculations to extremely high precision. The ability to predict such rates could revolutionize the design of solar cells, industrial catalysts, batteries, flexible electronics, medicines, materials and more. The primary difficulty is that molecular systems form highly entangled quantum superposition states which require exponentially many classical computing resources in order to represent to sufficiently high precision. For example, exactly computing the energies of methane (CH4) takes about one second, but the same calculation takes about ten minutes for ethane (C2H6) and about ten days for propane (C3H8).In our experiment, we focus on an approach known as the variational quantum eigensolver (VQE), which can be understood as a quantum analog of a neural network. Whereas a classical neural network is a parameterized mapping that one trains in order to model classical data, VQE is a parameterized mapping (e.g. a quantum circuit) that one trains in order to model quantum data (e.g. a molecular wavefunction). The training objective for VQE is the molecular energy function, which is always minimized by the true ground state. The quantum advantage of VQE is that quantum bits can efficiently represent the molecular wavefunction whereas exponentially many classical bits would be required.Using VQE, we quantum computed the energy landscape of molecular hydrogen, H2. We compared the performance of VQE to another quantum algorithm for chemistry, the phase estimation algorithm (PEA). Experimentally computed energies, as a function of the H - H bond length, are shown below alongside the exact curve. We were able to obtain such high performance with VQE because the neural-network-like training loop helped to establish experimentally optimal circuit parameters for representing the wavefunction in the presence of systematic control errors. One can understand this by considering a hardware implementation of a neural network with a faulty weight, e.g. the weight is only represented half as strong as it should be. Because the weights of the neural network are established via a closed-loop training procedure which can compensate for such systematic errors, the hardware neural network is robust against such imperfections. Likewise, despite systematic errors in our implementation of the VQE circuit, we are still able to learn an accurate model for the wavefunction. This robustness inspires hope that VQE may be able to solve [...]

Wide & Deep Learning: Better Together with TensorFlow


Posted by Heng-Tze Cheng, Senior Software Engineer, Google ResearchThe human brain is a sophisticated learning machine, forming rules by memorizing everyday events (“sparrows can fly” and “pigeons can fly”) and generalizing those learnings to apply to things we haven't seen before (“animals with wings can fly”). Perhaps more powerfully, memorization also allows us to further refine our generalized rules with exceptions (“penguins can't fly”). As we were exploring how to advance machine intelligence, we asked ourselves the question—can we teach computers to learn like humans do, by combining the power of memorization and generalization?It's not an easy question to answer, but by jointly training a wide linear model (for memorization) alongside a deep neural network (for generalization), one can combine the strengths of both to bring us one step closer. At Google, we call it Wide & Deep Learning. It's useful for generic large-scale regression and classification problems with sparse inputs (categorical features with a large number of possible feature values), such as recommender systems, search, and ranking problems.Today we’re open-sourcing our implementation of Wide & Deep Learning as part of the TF.Learn API so that you can easily train a model yourself. Please check out the TensorFlow tutorials on Linear Models and Wide & Deep Learning, as well as our research paper to learn more. allowfullscreen="" class="YOUTUBE-iframe-video" data-thumbnail-src="" frameborder="0" height="360" src="" width="640">How Wide & Deep Learning works.Let's say one day you wake up with an idea for a new app called FoodIO*. A user of the app just needs to say out loud what kind of food he/she is craving for (the query). The app magically predicts the dish that the user will like best, and the dish gets delivered to the user's front door (the item). Your key metric is consumption rate—if a dish was eaten by the user, the score is 1; otherwise it's 0 (the label).You come up with some simple rules to start, like returning the items that match the most characters in the query, and you release the first version of FoodIO. Unfortunately, you find that the consumption rate is pretty low because the matches are too crude to be really useful (people shouting “fried chicken” end up getting “chicken fried rice”), so you decide to add machine learning to learn from the data.The Wide model.In the 2nd version, you want to memorize what items work the best for each query. So, you train a linear model in TensorFlow with a wide set of cross-product feature transformations to capture how the co-occurrence of a query-item feature pair correlates with the target label (whether or not an item is consumed). The model predicts the probability of consumption P(consumption | query, item) for each item, and FoodIO delivers the top item with the highest predicted consumption rate. For example, the model learns that feature AND(query="fried chicken", item="chicken and waffles") is a huge win, while AND(query="fried chicken", item="chicken fried rice") doesn't get as much love even though the character match is higher. In other words, FoodIO 2.0 does a pretty good job memorizing what users like, and it starts to get more traction.The Deep model.Later on you discover that many users are saying that they're tired of the recommendations. They're eager to discover similar but different cuisines with a “surprise me” state of mind. So [...]

CVPR 2016 & Research at Google


Posted by Rahul Sukthankar, Research ScientistThis week, Las Vegas hosts the 2016 Conference on Computer Vision and Pattern Recognition (CVPR 2016), the premier annual computer vision event comprising the main conference and several co-located workshops and short courses. As a leader in computer vision research, Google has a strong presence at CVPR 2016, with many Googlers presenting papers and invited talks at the conference, tutorials and workshops.We congratulate Google Research Scientist Ce Liu and Google Faculty Advisor Abhinav Gupta, who were selected as this year’s recipients of the PAMI Young Researcher Award for outstanding research contributions within computer vision. We also congratulate Googler Henrik Stewenius for receiving the Longuet-Higgins Prize, a retrospective award that recognizes up to two CVPR papers from ten years ago that have made a significant impact on computer vision research, for his 2006 CVPR paper “Scalable Recognition with a Vocabulary Tree”, co-authored with David Nister, during their time at University of Kentucky.If you are attending CVPR this year, please stop by our booth and chat with our researchers about the projects and opportunities at Google that go into solving interesting problems for hundreds of millions of people. The Google booth will also showcase several recent efforts, including the technology behind Motion Stills, a live demo of neural network-based image compression and TensorFlow-Slim, the lightweight library for defining, training and evaluating models in TensorFlow. Learn more about our research being presented at CVPR 2016 in the list below (Googlers highlighted in blue).Oral PresentationsGeneration and Comprehension of Unambiguous Object DescriptionsJunhua Mao, Jonathan Huang, Alexander Toshev, Oana Camburu, Alan L. Yuille, Kevin Murphy Detecting Events and Key Actors in Multi-Person VideosVignesh Ramanathan, Jonathan Huang, Sami Abu-El-Haija, Alexander Gorban, Kevin Murphy, Li Fei-FeiSpotlight Session: 3D ReconstructionDeepStereo: Learning to Predict New Views From the World’s ImageryJohn Flynn, Ivan Neulander, James Philbin, Noah SnavelyPostersDiscovering the Physical Parts of an Articulated Object Class From Multiple VideosLuca Del Pero, Susanna Ricco, Rahul Sukthankar, Vittorio Ferrari Blockout: Dynamic Model Selection for Hierarchical Deep NetworksCalvin Murdock, Zhen Li, Howard Zhou, Tom DuerigRethinking the Inception Architecture for Computer VisionChristian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jon Shlens, Zbigniew WojnaImproving the Robustness of Deep Neural Networks via Stability TrainingStephan Zheng, Yang Song, Thomas Leung, Ian GoodfellowSemantic Image Segmentation With Task-Specific Edge Detection Using CNNs and a Discriminatively Trained Domain TransformLiang-Chieh Chen, Jonathan T. Barron, George Papandreou, Kevin Murphy, Alan L. YuilleTutorialOptimization Algorithms for Subset Selection and Summarization in Large Data SetsEhsan Elhamifar, Jeff Bilmes, Alex Kulesza, Michael GygliWorkshopsPerceptual Organization in Computer Vision: The Role of Feedback in Recognition and ReorganizationOrganizers: Katerina Fragkiadaki, Phillip Isola, Joao CarreiraInvited talks: Viren Jain, Jitendra MalikVQA Challenge WorkshopInvited talks: Jitendra Malik, Kevin MurphyWomen in Computer VisionInvited talk: Caroline PantofaruComputational Models for Learning Systems and Educational AssessmentInvited talk: Jonathan HuangLarge-Scale Scene Understanding (LSUN) ChallengeInvited talk: Jitendra MalikLarge[...]

Project Bloks: Making code physical for kids


Posted by Steve Vranakis and Jayme Goldstein, Executive Creative Director and Project Lead, Google Creative LabAt Google, we’re passionate about empowering children to create and explore with technology. We believe that when children learn to code, they’re not just learning how to program a computer—they’re learning a new language for creative expression and are developing computational thinking: a skillset for solving problems of all kinds. In fact, it’s a skillset whose importance is being recognised around the world—from President Obama’s CS4All program to the inclusion of Computer Science in the UK National Curriculum. We’ve long supported and advocated the furthering of CS education through programs and platforms such as Blockly, Scratch Blocks, CS First and Made w/ Code.Today, we’re happy to announce Project Bloks, a research collaboration between Google, Paulo Blikstein (Stanford University) and IDEO with the goal of creating an open hardware platform that researchers, developers and designers can use to build physical coding experiences. As a first step, we’ve created a system for tangible programming and built a working prototype with it. We’re sharing our progress before conducting more research over the summer to inform what comes next.Physical codingKids are inherently playful and social. They naturally play and learn by using their hands, building stuff and doing things together. Making code physical - known as tangible programming - offers a unique way to combine the way children innately play and learn with computational thinking.Project Bloks is preceded and shaped by a long history of educational theory and research in the area of hands-on learning. From Friedrich Froebel, Maria Montessori and Jean Piaget’s pioneering work in the area of learning by experience, exploration and manipulation, to the research started in the 1970s by Seymour Papert and Radia Perlman with LOGO and TORTIS. This exploration has continued to grow and includes a wide range of research and platforms.However, designing kits for tangible programming is challenging—requiring the resources and time to develop both the software and the hardware. Our goal is to remove those barriers. By creating an open platform, Project Bloks will allow designers, developers and researchers to focus on innovating, experimenting and creating new ways to help kids develop computational thinking. Our vision is that, one day, the Project Bloks platform becomes for tangible programming what Blockly is for on-screen programming. allowfullscreen="" class="YOUTUBE-iframe-video" data-thumbnail-src="" frameborder="0" height="360" src="" width="640">The Project Bloks systemWe’ve designed a system that developers can customise, reconfigure and rearrange to create all kinds of different tangible programming experiences.A birdseye view of the customisable and reconfigurable Project Bloks systemThe Project Bloks system is made up of three core components the “Brain Board”, “Base Boards” and “Pucks”. When connected together they create a set of instructions which can be sent to connected devices, things like toys or tablets, over wifi or Bluetooth. The three core components of the Project Bloks systemPucks: abundant, inexpensive, customisable physical instructionsPucks are what make the Project Bloks system so versatile. They help bring the infinite flexibility of software p[...]

Bringing Precision to the AI Safety Discussion


We believe that AI technologies are likely to be overwhelmingly useful and beneficial for humanity. But part of being a responsible steward of any new technology is thinking through potential challenges and how best to address any associated risks. So today we’re publishing a technical paper, Concrete Problems in AI Safety, a collaboration among scientists at Google, OpenAI, Stanford and Berkeley.

While possible AI safety risks have received a lot of public attention, most previous discussion has been very hypothetical and speculative. We believe it’s essential to ground concerns in real machine learning research, and to start developing practical approaches for engineering AI systems that operate safely and reliably.

We’ve outlined five problems we think will be very important as we apply AI in more general circumstances. These are all forward thinking, long-term research questions -- minor issues today, but important to address for future systems:

  • Avoiding Negative Side Effects: How can we ensure that an AI system will not disturb its environment in negative ways while pursuing its goals, e.g. a cleaning robot knocking over a vase because it can clean faster by doing so?
  • Avoiding Reward Hacking: How can we avoid gaming of the reward function? For example, we don’t want this cleaning robot simply covering over messes with materials it can’t see through.
  • Scalable Oversight: How can we efficiently ensure that a given AI system respects aspects of the objective that are too expensive to be frequently evaluated during training? For example, if an AI system gets human feedback as it performs a task, it needs to use that feedback efficiently because asking too often would be annoying.
  • Safe Exploration: How do we ensure that an AI system doesn’t make exploratory moves with very negative repercussions? For example, maybe a cleaning robot should experiment with mopping strategies, but clearly it shouldn’t try putting a wet mop in an electrical outlet.
  • Robustness to Distributional Shift: How do we ensure that an AI system recognizes, and behaves robustly, when it’s in an environment very different from its training environment? For example, heuristics learned for a factory workfloor may not be safe enough for an office.

We go into more technical detail in the paper. The machine learning research community has already thought quite a bit about most of these problems and many related issues, but we think there’s a lot more work to be done.

We believe in rigorous, open, cross-institution work on how to build machine learning systems that work as intended. We’re eager to continue our collaborations with other research groups to make positive progress on AI.(image)

ICML 2016 & Research at Google


Posted by Afshin Rostamizadeh, Research ScientistThis week, New York hosts the 2016 International Conference on Machine Learning (ICML 2016), a premier annual Machine Learning event supported by the International Machine Learning Society (IMLS). Machine Learning is a key focus area at Google, with highly active research groups exploring virtually all aspects of the field, including deep learning and more classical algorithms. We work on an extremely wide variety of machine learning problems that arise from a broad range of applications at Google. One particularly important setting is that of large-scale learning, where we utilize scalable tools and architectures to build machine learning systems that work with large volumes of data that often preclude the use of standard single-machine training algorithms. In doing so, we are able to solve deep scientific problems and engineering challenges, exploring theory as well as application, in areas of language, speech, translation, music, visual processing and more.As Gold Sponsor, Google has a strong presence at ICML 2016 with many Googlers publishing their research and hosting workshops. If you’re attending, we hope you’ll visit the Google booth and talk with our researchers to learn more about the exciting work, creativity and fun that goes into solving interesting ML problems that impact millions of people. You can also learn more about our research being presented at ICML 2016 in the list below (Googlers highlighted in blue).ICML 2016 Organizing CommitteeArea Chairs include: Corinna Cortes, John Blitzer, Maya Gupta, Moritz Hardt, Samy BengioIMLSBoard Members include: Corinna CortesAccepted PapersADIOS: Architectures Deep In Output SpaceMoustapha Cisse, Maruan Al-Shedivat, Samy BengioAssociative Long Short-Term MemoryIvo Danihelka (Google DeepMind), Greg Wayne (Google DeepMind), Benigno Uria (Google DeepMind), Nal Kalchbrenner (Google DeepMind), Alex Graves (Google DeepMind) Asynchronous Methods for Deep Reinforcement LearningVolodymyr Mnih (Google DeepMind), Adria Puigdomenech Badia (Google DeepMind), Mehdi Mirza, Alex Graves (Google DeepMind), Timothy Lillicrap (Google DeepMind), Tim Harley (Google DeepMind), David Silver (Google DeepMind), Koray Kavukcuoglu (Google DeepMind)Binary embeddings with structured hashed projectionsAnna Choromanska, Krzysztof Choromanski, Mariusz Bojarski, Tony Jebara, Sanjiv Kumar, Yann LeCunDiscrete Distribution Estimation Under Local PrivacyPeter Kairouz, Keith Bonawitz, Daniel RamageDueling Network Architectures for Deep Reinforcement Learning (Best Paper Award recipient)Ziyu Wang (Google DeepMind), Nando de Freitas (Google DeepMind), Tom Schaul (Google DeepMind), Matteo Hessel (Google DeepMind), Hado van Hasselt (Google DeepMind), Marc Lanctot (Google DeepMind)Exploiting Cyclic Symmetry in Convolutional Neural NetworksSander Dieleman (Google DeepMind), Jeffrey De Fauw (Google DeepMind), Koray Kavukcuoglu (Google DeepMind)Fast Constrained Submodular Maximization: Personalized Data SummarizationBaharan Mirzasoleiman, Ashwinkumar Badanidiyuru, Amin KarbasiGreedy Column Subset Selection: New Bounds and Distributed AlgorithmsJason Altschuler, Aditya Bhaskara, Gang Fu, Vahab Mirrokni, Afshin Rostamizadeh, Morteza Zadimoghaddam Horizontally Scalable Submodular MaximizationMario Lucic, Olivier Bachem, Mortez[...]

Announcing Google Research, Europe


Posted by Emmanuel Mogenet, Head of Google Research, EuropeGoogle’s ongoing research in Machine Intelligence is what powers many of the products being used by hundreds of millions of people a day - from Translate to Photo Search to Smart Reply for Inbox. One of the things that enables these advances is the extensive collaboration between the Google researchers in our offices across the world, all contributing their unique knowledge and disseminating ideas in state-of-the-art Machine Learning (ML) technologies and techniques in order to develop useful tools and products.Today, we’re excited to announce a dedicated Machine Learning research group in Europe, based in our Zurich office. Google Research, Europe, will foster an environment where software engineers and researchers specialising in ML will have the opportunity to develop products and conduct research right here in Europe, as part of the wider efforts at Google.Zurich is already the home of Google’s largest engineering office outside the US, and is responsible for developing the engine that powers Knowledge Graph, as well as the conversation engine that powers the Google Assistant in Allo. In addition to continued collaboration with Google’s various research teams, Google Research, Europe will be focused on three key areas:Machine IntelligenceNatural Language Processing & UnderstandingMachine PerceptionIn pursuit of these areas, the team will actively research ways in which to improve ML infrastructure, broadly facilitating research for the community, and enabling it to be put to practical use. Furthermore, researchers in the Zurich office will be uniquely able to work closely with team linguists, advancing Natural Language Understanding in collaboration with Google Research groups across the world, all while enjoying Mountain Views of a different kind. Europe is home to some of the world’s premier technical universities, making it an ideal place to build a top-notch research team. We look forward to collaborating with all the excellent Computer Science research that is coming from the region, and hope to contribute towards the wider academic community through our publications and academic support. [...]

Quantum annealing with a digital twist


Posted by Rami Barends and Alireza Shabani, Quantum Electronics EngineersOne of the key benefits of quantum computing is that it has the potential to solve some of the most complex problems in nature, from physics to chemistry to biology. For example, when attempting to calculate protein folding, or when exploring reaction catalysts and “designer” molecules, one can look at computational challenges as optimization problems, and represent the different configurations of a molecule as an energy landscape in a quantum computer. By letting the system cool, or “anneal”, one finds the lowest energy state in the landscape - the most stable form of the molecule. Thanks to the peculiarities of quantum mechanics, the correct answer simply drops out at the end of the quantum computation. In fact, many tough problems can be dealt with this way, this combination of simplicity and generality makes it appealing.But finding the lowest energy state in a system is like being put in the Alps, and being told to find the lowest elevation - it’s easy to get stuck in a “local” valley, and not know that there is an even lower point elsewhere. Therefore, we use a different approach: We start with a very simple energy landscape - a flat meadow - and initialize the system of quantum bits (qubits) to represent the known lowest energy point, or “ground state”, in that landscape. We then begin to adjust the simple landscape towards one that represents the problem we are trying to solve - from the smooth meadow to the highly uneven terrain of the Alps. Here’s the fun part: if one evolves the landscape very slowly, the ground state of the qubits also evolves, so that they stay in the ground state of the changing system. This is called “adiabatic quantum computing”, and qubits exploit quantum tunneling to ensure they always find the lowest energy "valley" in the changing system. While this is great in theory, getting this to work in practice is challenging, as you have to set up the energy landscape using the available qubit interactions. Ideally you’d have multiple interactions going on between all of the qubits, but for a large-scale solver the requirements to accurately keep track of these interactions become enormous. Realistically, the connectivity has to be reduced, but this presents a major limitation for the computational possibilities.In "Digitized adiabatic quantum computing with a superconducting circuit", published in Nature, we’ve overcome this obstacle by giving quantum annealing a digital twist. With a limited connectivity between qubits you can still construct any of the desired interactions: Whether the interaction is ferromagnetic (the quantum bits prefer an aligned) or antiferromagnetic (anti-aligned orientation), or even defined along an arbitrary different direction, you can make it happen using easy to combine discrete building blocks. In this case, the blocks we use are the logic gates that we've been developing with our superconducting architecture. Superconducting quantum chip with nine qubits. Each qubit (cross-shaped structures in the center) is connected to its neighbors and individually controlled. Photo credit: Julian Kelly.The key is controllability. Qubits, like other physical objects in nature, have a resonance frequency, and can be addressed individually with short voltage and current pulses. In our architecture we can steer this frequency, much l[...]