Subscribe: O'Reilly Radar - Insight, analysis, and research about emerging technologies
Added By: Feedage Forager Feedage Grade B rated
Language: English
continue reading  continue  data  deep learning  july  learning  links july  links  new  reading  short links  short   
Rate this Feed
Rate this feedRate this feedRate this feedRate this feedRate this feed
Rate this feed 1 starRate this feed 2 starRate this feed 3 starRate this feed 4 starRate this feed 5 star

Comments (0)

Feed Details and Statistics Feed Statistics
Preview: O'Reilly Radar - Insight, analysis, and research about emerging technologies

All - O'Reilly Media

All of our Ideas and Learning material from all of our topics.

Updated: 2017-07-24T06:37:44Z


Why product management isn’t a cookie cutter position



Blair Reeves discusses how the roles and responsibilities of a product manager vary from one company to the next.

Continue reading Why product management isn’t a cookie cutter position.


Four short links: 21 July 2017


Offline First, Security Tools, Learning Game Strategy, and Design Documentation

  1. Offline First -- how to build an offline-first site in Javascript.
  2. Blackhat Arsenal -- software being released or updated during the Blackhat Arsenal event (e.g., DefPloreX, a machine-learning toolkit for large-scale e-crime forensics; and CBM, the "Car Backdoor Maker").
  3. Learning Macromanagement in StarCraft from Replays using Deep Learning -- Neural networks are trained on 789,571 state-action pairs extracted from 2,005 replays of highly skilled players, achieving top-1 and top-3 error rates of 54.6% and 22.9% in predicting the next build action. By integrating the trained network into UAlbertaBot, an open source StarCraft bot, the system can significantly outperform the game’s built-in Terran bot and play competitively against UAlbertaBot with a fixed rush strategy. (via Mark Riedl)
  4. Making Engineering Team Communication Clearer, Faster, Better -- it’s very important to make sure you have a process that actually gets people to read the document. The write-only document fired off into the void is a common problem, and this talks about how to solve it (for design documents, but the principles translate).

Continue reading Four short links: 21 July 2017.


How big data and AI will reshape the automotive industry



The O’Reilly Data Show Podcast: Evangelos Simoudis on next-generation mobility services.

In this episode of the Data Show, I spoke with Evangelos Simoudis, co-founder of Synapse Partners and a frequent contributor to O’Reilly. He recently published a book entitled The Big Data Opportunity in Our Driverless Future, and I wanted get his thoughts on the transportation industry and the role of big data and analytics in its future. Simoudis is an entrepreneur, and he also advises and invests in many technology startups. He became interested in the automotive industry long before the current wave of autonomous vehicle startups was in the planning stages.

Continue reading How big data and AI will reshape the automotive industry.


Adopting AI in the Enterprise: Ford Motor Company


Dimitar Filev on bringing cutting-edge computational intelligence to cars and the factories that build them.Driverless cars aren’t the only application for deep learning on the road: neural networks have begun to make their way into every corner of the automotive industry, from supply-chain management to engine controllers. In this installment of our ongoing series on artificial intelligence (AI) and machine learning (ML) in the enterprise, we speak with Dimitar Filev, executive technical leader at Ford Research & Advanced Engineering, who leads the team focused on control methods and computational intelligence. What was the first application of AI and ML at Ford? Ford research lab has been conducting systematic research on computational intelligence—one of the branches of AI—for more than 20 years. About 15 years ago, Ford Motor Company introduced one of the first large-scale industrial applications of neural networks. Ford researchers developed and implemented, in mass-produced cars, an innovative misfire detection system—a neural-net-based classifier of crankshaft acceleration patterns for diagnosing engine misfire (undesirable combustion failure that has a negative impact on performance and emissions). Multiple other AI applications to Ford product and manufacturing followed this success. How do you leverage AI and ML today to create a better product? We can think of two categories of ML and AI applications in our vehicles. In addition to the obvious applications in driverless cars, Ford has also developed AI-based technologies that enable different functions in support of vehicle engineering. These are not always visible to the driver. As I mentioned before, we used recurrent-neural-net-based classifiers for misfire detection in V10 engines; we also use them for intruder detection when the driver is away from vehicle. We also use fuzzy logic-type rule-based gain scheduling controllers integrated with the battery control systems of hybrid-electric vehicles. In our supply chain, neural networks are the main drivers behind the inventory management system recommending specific vehicle configurations to dealers, and evolutionary computing algorithms (in conjunction with dynamic semantic network-based expert systems) are deployed in support of resource management in assembly plants. Are there other use cases within Ford today? Another group of AI applications is driven by the fact that current vehicles have evolved into complex mobile cyber systems with increasing computational power and resources generating gigabytes of information per hour, continuously connected, with information following to, from, and through the platform. Increased capability of vehicle systems, along with the growing customer demand for new features, product improvement, personalization, rich information utilization, etc., are some of the drivers for introducing machine learning techniques in modern vehicles. The most common AI applications involve direct driver interaction, including advisory systems that monitor acceleration and braking patterns to provide on-board evaluations of a driver’s preferences and intentions for different purposes—characterization of the driver, advice for fuel-efficient driving and safe driving, auto-selecting the optimal suspension and steering modes, simplifying the human-machine interface by estimating the most likely next destination, and preferred settings of the climate control, etc. These systems use traditional AI methods—rule-based, Markov models, clustering; they do not require special hardware. One of their distinctive features is to be intelligent enough to identify the level of acceptance of provided recommendations, and avoid drivers’ annoyance. Recent extensive development of autonomous vehicles is the driver for deep learning applications to vehicle localization, object detection, classification, and tracking. We can expect in t[...]

Chris Stetson on system migrations and defining a microservices reference architecture


The O’Reilly Podcast: Helping developers improve performance, security, and service discoverability.In this podcast episode, O’Reilly’s Jeff Bleiel talks with Chris Stetson, chief architect and head of engineering at NGINX. They discuss Stetson’s experiences working on microservices-based systems and how a microservices reference architecture can ease a development team’s pain when shifting from a monolithic application to many individualized microservices. src="" height="166" width="100%" frameborder="no" scrolling="no"> Like most developers, Stetson started off writing monolithic applications before moving over to a service-oriented architecture, where he broke apart different components of the application. “So many developers will approach building an application as a monolith because they don’t have to build out the infrastructure, orchestration tools, networking capabilities, and contracts between the different components,” he said. “However, many developers and teams today will approach their application as a monolith with the idea that there will be a clear separation of concerns for different parts that can easily be broken out.” According to Stetson, the benefits for developers in adopting microservices is similar to the Agile movement, in that you may only have a couple weeks to work on a feature or piece of functionality. “Microservices encapsulate a set of functions and services that are constrained to a single set of concerns,” he said. “As a result, developers need to optimize around those concerns, which will help them build a really powerful and complete system that is harder to accomplish with a large monolithic application.” Stetson’s passion for optimizing systems around microservices led him to spearhead the creation of NGINX’s Microservices Reference Architecture. “This reference architecture was our attempt to understand how we could help our customers build a microservice application while helping them improve aspects of their architecture related to performance, security, service discovery, and circuit breaker pattern functionality within the environment,” he said. “The reference architecture is an actual photo-sharing application, similar to Flickr or Shutterfly. We chose that application idea since it’s one everyone is familiar with, and it showcases powerful asymmetric computing requirements.” This reference architecture includes three different networking models: the Proxy Model, the Router Mesh, and the Fabric Model. Stetson mentioned how the Proxy Model is similar in function to, and complements, Kubernetes’ Ingress Controller. (Kubernetes is an open source orchestration tool for managing the deployment and instances of containerized applications.) “Kubernetes has a very powerful framework for organizing microservices, allowing effective communication between services, and providing network segmentation,” he said. “It offers a lot of great services for systems to take advantage of in order to perform traffic management within a microservice application.” This post and podcast is a collaboration between O'Reilly and NGINX. See our statement of editorial independence. Continue reading Chris Stetson on system migrations and defining a microservices reference architecture.[...]

Edward Callahan on reactive microservice deployments


The O’Reilly Podcast: Modify your existing pipeline to embrace failure in isolation.In this podcast episode, I talk about reactive microservice deployments with Edward Callahan, a senior engineer with Lightbend. We discuss the difference between a normal deployment pipeline and one that’s fully reactive, as well as the impact reactive deployments have on software teams. src="" height="166" width="100%" frameborder="no" scrolling="no"> Callahan mentioned how a deployment platform must be developer and operator friendly in order to enable the highly productive, iterative development being sought by enterprises undergoing software-led transformations. However, it can be very easy for software teams to get frustrated with the operational tooling generally available. “Integration is often cumbersome on the development process,” he said. “Development and operations teams are demanding more from the operational machinery they depend on for the success of their applications and services.” For enterprises already developing reactive applications, these development teams are starting to realize their applications should be deployed to an equally reactive deployment platform. “With the complexity of managing state in a distributed deployment being handled reactively, the deployment workflow becomes a simplified and reliable pipeline,” Callahan said. “This frees developers to address business needs instead of the many details of delivering clustered services.” Callahan said the same core reactive principles that define an application’s design—responsiveness, resiliency, elasticity, and message driven—can also be applied to a reactive deployment pipeline. The following characteristics of such a pipeline include: Developer and operations friendly—The deployment pipeline should support ease of testing, continuous delivery, cluster conveniences, and composability. Application-centric logging, telemetry, and monitoring—Meaningful, actionable data is far more valuable than petabytes of raw telemetry. How many messages are in a given queue and how long it is taking to service those events is far more indicative of the service response times that are tied to your service-level agreements. Application-centric process monitoring—A fundamental aspect of monitoring is that the supervisory system automatically restarts services if they terminate unexpectedly. Elastic and scalable—Scaling the number of instances of a service and scaling the resources of a cluster. Clusters need some amount of spare capacity or headroom. According to Callahan, the main difference between a normal deployment pipeline and a reactive one is the ability for the system to embrace failure in isolation. “Failure cannot be avoided,” he said. “You must embrace failure and seek to keep your services available despite it, even if this requires operating in a degraded manner. Let it crash! Instead of attempting to repair nodes when they fail, you replace the failing resources with new ones.” When making the move to a reactive deployment pipeline, software teams need to remain flexible in the face of change. They also must stay mindful of any potential entrapments resulting from vendor lock-in around a new platform. “Standards really do help here,” Callahan said. “Watch for the standards as you move through the journey of building your own reactive deployment pipeline.” This post is a collaboration between O'Reilly and Lightbend. See our statement of editorial independence. Continue reading Edward Callahan on reactive microservice deployments.[...]

Four short links: 20 July 2017


SQL Equivalence, Streaming Royalties, Open Source Publishing, and Serial Entitlement

  1. Introducing Cosette -- a SQL solver for automatically checking semantic equivalences of SQL queries. With Cosette, one can easily verify the correctness of SQL rewrite rules, find errors in buggy SQL rewrites, build auto-graders for SQL assignments, develop SQL optimizers, bust “fake SQLs,” etc. Open source, from the University of Washington.
  2. Streaming Services Royalty Rates Compared (Information is Beautiful) -- the lesson is that it's more profitable to work for a streaming service than to be an artist hosted on it.
  3. Editoria -- open source web-based, end-to-end, authoring, editing, and workflow tool that presses and library publishers can leverage to create modern, format-flexible, standards-compliant, book-length works. Funded by the Mellon Foundation, Editoria is a project of the University of California Press and the California Digital Library.
  4. The Al Capone Theory of Sexual Harassment (Val Aurora) -- The U.S. government recognized a pattern in the Al Capone case: smuggling goods was a crime often paired with failing to pay taxes on the proceeds of the smuggling. We noticed a similar pattern in reports of sexual harassment and assault: often people who engage in sexually predatory behavior also faked expense reports, plagiarized writing, or stole credit for other people’s work.

Continue reading Four short links: 20 July 2017.


Katie Moussouris on how organizations should and shouldn’t respond to reported vulnerabilities



The O’Reilly Security Podcast: Why legal responses to bug reports are an unhealthy reflex, thinking through first steps for a vulnerability disclosure policy, and the value of learning by doing.

In this episode, O’Reilly’s Courtney Nash talks with Katie Moussouris, founder and CEO of Luta Security. They discuss why many organizations have a knee-jerk legal response to a bug report (and why your organization shouldn’t), the first steps organizations should take in formulating a vulnerability disclosure program, and how learning through experience and sharing knowledge benefits all.

Continue reading Katie Moussouris on how organizations should and shouldn’t respond to reported vulnerabilities.


Growth hacking in SEO



A look at the successes and failures of a company using experimental SEO practices.

Classic Growth Hackin'

This project was one of those during which we kept high-fiving constantly because the money was rolling in steadily. When the chemistry is right for both agency and client, magical things can happen. Strong communication between marketing and product development will only benefit sales. It behooves the growth hacker to stay as granular as possible with sales processes (to identify opportunities). The growth challenges we faced were related to the consumer's trust. Previous regimes at the company tried and failed at paid search, paid social, and old-fashioned PR (smiling and dialing).

This project was one of the fastest increases in growth for both program development and revenue that I've ever seen. We quadrupled the company's sales in less than two months by leveraging organic, social, and paid traffic channels. We were promoting a business-to-consumer widget that was a cool modern twist on an old established household product. They'd raised a plethora of money from a crowdfunding site. A slew of high-priced consultants and CMOs had been in and out of the organization, which was apparent by their use of four distinct analytics suites of tools. To reach for the stars is a noble goal, but to build momentum teams have to be unified.

Continue reading Growth hacking in SEO.


Four short links: 19 July 2017


Open Source Car Code, Glass for Business, Videogame Narrative Skills, and Gmail Leverage

  1. Apollo -- open source autonomous auto platform.
  2. Glass for Enterprise -- Google X has relaunched Glass for businesses. See blog post and Steven Levy. A HUD for assembly operators in factories with marked results. “We knew the value of wearable technology when we first put it on the floor,” Gulick says. “In our first test in quality, our numbers were so high in the value it was adding that we actually retested and retested and retested. Some of the numbers we couldn't even publish because the leadership said they looked way too high.” I've been telling people for years that the killer app is boring business. Interesting that G has a new sales model: they make the hardware but sell to partners who will create specific applications and sell to customers.
  3. Game Writing: Narrative Skills for Videogames -- as VR and AR introduce branching stories into our lives (never mind Westworld-like hotels), the art of flexible narrative is a useful one to study. This is a review of a 2008 book with contributions from greats in the field. Of the books on professional games writing I’ve encountered, this is possibly the best, and definitely in the top three.
  4. Google Hire -- recruitment tool, but notable because they're integrating their enterprise tools into Gmail (note: you can't integrate your enterprise tool into Gmail). Most workflows touch email at some point, and that's the precise point where systems like Salesforce chafe. (I still twitch remembering the Chrome plugin for Gmail-Salesforce integration that we used at a previous startup.) Owning the mail client gives them huge opportunity here.

Continue reading Four short links: 19 July 2017.


Four short links: 18 July 2017


Fooling Image Recognition, Electronics Text, Zero-Knowledge Proofs, and Massively Parallel Protein Design

  1. Robust Adversarial Inputs -- tricking deep learning image recognition models. We’ve created images that reliably fool neural network classifiers when viewed from varied scales and perspectives.
  2. CircuitLab Textbook -- free introductory electronics textbook, work in progress.
  3. The Hunting of the Snark -- a treasure hunt consisting of cryptographic challenges that will guide you through a zero-knowledge proof (ZKP) learning experience. As a reminder, zero-knowledge proofs, invented decades ago, allow verifiers to validate a computation on private data by allowing a prover to generate a cryptographic proof that asserts to the correctness of the computed output.
  4. Massively Parallel Protein Design -- We combined computational protein design, next-generation gene synthesis, and a high-throughput protease susceptibility assay to measure folding and stability for more than 15,000 de novo designed miniproteins, 1,000 natural proteins, 10,000 point mutants, and 30,000 negative control sequences. This analysis identified more than 2,500 stable designed proteins in four basic folds—a number sufficient to enable us to systematically examine how sequence determines folding and stability in uncharted protein space. Clever approach to understanding protein folding. (via Ian Haydon)

Continue reading Four short links: 18 July 2017.


Textual entailment with TensorFlow


Using neural networks to explore natural language.Textual entailment is a simple exercise in logic that attempts to discern whether one sentence can be inferred from another. A computer program that takes on the task of textual entailment attempts to categorize an ordered pair of sentences into one of three categories. The first category, called “positive entailment,” occurs when you can use the first sentence to prove that a second sentence is true. The second category, “negative entailment,” is the inverse of positive entailment. This occurs when the first sentence can be used to disprove the second sentence. Finally, if the two sentences have no correlation, they are considered to have a “neutral entailment.” Textual entailment is useful as a component in much larger applications. For example, question-answering systems may use textual entailment to verify an answer from stored information. Textual entailment may also enhance document summarization by filtering out sentences that don’t include new information. Other natural language processing (NLP) systems find similar uses for entailment. This article will guide you through how to build a simple and fast-to-train neural network to perform textual entailment using TensorFlow. Before we get started In addition to installing TensorFlow version 1.0, make sure you’ve installed each of the following: Jupyter Numpy Matplotlib To get a better sense of progress during network training, you're also welcome to install TQDM, but it's not required. Please access the code and Jupyter Notebook for this article on GitHub. We’ll be using Stanford’s SNLI data set for our training, but we’ll download and extract the data we need using code from the Jupyter Notebook, so you don’t need to download it manually. If this is your first time working with TensorFlow, I’d encourage you to check out Aaron Schumacher’s article, “Hello, Tensorflow.” We’ll start by doing all necessary imports, and we’ll let our Jupyter Notebook know it should display graphs and images in the notebook itself. %matplotlib inline import tensorflow as tf import numpy as np import matplotlib.pyplot as plt import matplotlib.ticker as ticker import urllib import sys import os import zipfile The files we're about to use may take five minutes or more to download, so if you're following along by running the program in the corresponding notebook, feel free to start running the next few cells. In the meantime, let’s explore textual entailment in further detail. Examples of textual entailment In this section, we’ll walk through a few examples of textual entailment to illustrate what we mean by positive, negative, and neutral entailment. To begin, we’ll look at positive entailment—when you read, for example, that “Maurita and Jade both were at the scene of the car crash,” you can infer that “Multiple people saw the accident.” In this example sentence pair, we can prove the second sentence (also known as a “hypothesis”) from the first sentence (also called the “text”), meaning that this represents a positive entailment. Given that Maurita and Jade were both there to view the crash, multiple people must have seen it. Note: “car crash” and “accident” have similar meanings, but they aren’t the same word. In fact, entailment doesn’t always mean that the sentences share words, as can be seen in this sentence pair, which only shares the word “the.” Let’s consider another sentence pair. How, if at all, does the sentence “Two dogs played in the park with the old man” entail “There was only one canine in the park that day”? If there are two dogs, there must be at least two canines. Since the second s[...]

Four short links: 17 July 2017


Discarded GPUs, Go REPL, Learning Point Clouds, and 3D-Printing Nanopatterns

  1. Used GPUs Flood the Market as Ethereum Price Drops Below 150 -- On second-hand sales websites like eBay and Gumtree, we have seen a lot of new GPU listings appear in recent days, with plenty of used AMD RX series GPUs appearing over the weekend. More hardware is expected to hit these sites over the coming days as some miners wind down their operations, though many will simply move to a more profitable currency or to invest their computing power into an emerging cryptocurrency that has the prospect of high values in the future. That said, one HN commenter points out that in many areas with cheap power, it's still profitable to mine.
  2. go-pry -- An interactive REPL for Go that allows you to drop into your code at any point.
  3. Representation Learning and Adversarial Generation of 3D Point Clouds -- The expressive power of our learned embedding, obtained without human supervision, enables basic shape editing applications via simple algebraic manipulations, such as semantic part editing and shape interpolation. Figure 4 is the wow shot: interpolating between different tables, lounges, and chairs. (via Gene Kogan)
  4. Programming 2D/3D Shape-shifting with Hobbyist 3D Printers -- Here we present initially flat constructs that, upon triggering by high temperatures, change their shape to a pre-programmed 3D shape, thereby enabling the combination of surface-related functionalities with complex 3D shapes. Origami-like magic lets you print precisely controlled bio-nanopatterns, printed electronic components, and sensors/actuators.

Continue reading Four short links: 17 July 2017.


Four short links: 14 July 2017


Molecular Sensing, Faking Speech, Radical Technologies, and Bullshit Detection

  1. Scio -- handheld molecular sensing for $300.
  2. AI Can Fake Speech (IEEE) -- The research team had a neural net analyze millions of frames of video to determine how elements of Obama's face moved as he talked, such as his lips and teeth and wrinkles around his mouth and chin. [...] In the new study, the neural net learned what mouth shapes were linked to various sounds. The researchers took audio clips and dubbed them over the original sound files of a video. They next took mouth shapes that matched the new audio clips and grafted and blended them onto the video. Essentially, the researchers synthesized videos where Obama lip-synched words he said up to decades beforehand.
  3. Radical Technologies: The Design of Everyday Life (Adam Greenfield) -- none of our instincts will guide us in our approach to the next normal. If we want to understand the radical technologies all around us, and see just how they interact to produce the condition we recognize as everyday life, we'll need a manual. That is the project of this book.
  4. Introductory Bullshit Detection for Non-Technical Managers -- “I’m creating a framework to...” It means: I’m not interested in solving the actual problem, so I’m going to create something else so that the person who actually will solve the problem has to also fix the problems in my stuff on top of that.

Continue reading Four short links: 14 July 2017.


Neuroevolution: A different kind of deep learning


The quest to evolve neural networks through evolutionary algorithms.Neuroevolution is making a comeback. Prominent artificial intelligence labs and researchers are experimenting with it, a string of new successes have bolstered enthusiasm, and new opportunities for impact in deep learning are emerging. Maybe you haven’t heard of neuroevolution in the midst of all the excitement over deep learning, but it’s been lurking just below the surface, the subject of study for a small, enthusiastic research community for decades. And it’s starting to gain more attention as people recognize its potential. Put simply, neuroevolution is a subfield within artificial intelligence (AI) and machine learning (ML) that consists of trying to trigger an evolutionary process similar to the one that produced our brains, except inside a computer. In other words, neuroevolution seeks to develop the means of evolving neural networks through evolutionary algorithms. When I first waded into AI research in the late 1990s, the idea that brains could be evolved inside computers resonated with my sense of adventure. At that time, it was an unusual, even obscure field, but I felt a deep curiosity and affinity. The result has been 20 years of my life thinking about this subject, and a slew of algorithms developed with outstanding colleagues over the years, such as NEAT, HyperNEAT, and novelty search. In this article, I hope to convey some of the excitement of neuroevolution as well as provide insight into its issues, but without the opaque technical jargon of scientific articles. I have also taken, in part, an autobiographical perspective, reflecting my own deep involvement within the field. I hope my story provides a window for a wider audience into the quest to evolve brains within computers. The success of deep learning If you've been following AI or ML recently, you've probably heard about deep learning. Thanks to deep learning, computers can accomplish tasks like recognizing images and controlling autonomous vehicles (or even video game characters) at close to or sometimes surpassing human performance. These achievements have helped deep learning and AI in general to emerge from the obscurity of academic journals into the popular press and news media, inspiring the public imagination. So, what is actually behind deep learning that has enabled its success? In fact, underneath the hood in deep learning is the latest form of a decades-old technology called artificial neural networks (ANNs). Like many ideas in AI, ANNs are roughly inspired by biology; in this case, by the structure of the brain. We choose the brain as an inspiration for AI because the brain is the unequivocal seat of intelligence; while we're pursuing AI, it makes sense that, at some level, it should resemble the brain. And one of the key building blocks of brains is the neuron, a tiny cell that sends signals to other neurons over connections. When many neurons are connected to each other in a network (as happens in brains), we call that a neural network. So, an ANN is an attempt to simulate a collection of neuron-like components that send signals to each other. That's the underlying mechanism behind the "deep networks" in deep learning. Researchers in ANNs write a program that simulates these neurons and the signals that travel between them, yielding a process vaguely reminiscent of what happens in brains. Of course, there are also many differences. The challenge is that simply connecting a bunch of neuron-like elements to each other and letting them share signals does not yield intelligence. Intelligence, instead, arises from precisely how the neurons are connec[...]

Building a simple GraphQL server with Neo4j



How to implement a GraphQL API that queries Neo4j for a simple movie app.

GraphQL is a powerful new tool for building APIs that allows clients to ask for only the data they need. Originally designed at Facebook to minimize data sent over the wire and reduce round-trip API requests for rendering views in native mobile apps, GraphQL has since been open sourced to a healthy community that is building developer tools. There are also a number of large companies and startups such as GitHub, Yelp, Coursera, Shopify, and Mattermark building public and internal GraphQL APIs.

Despite what the name seems to imply, GraphQL is not a query language for graph databases, it is instead an API query language and runtime for building APIs. The “Graph” component of the name comes from the graph data model that GraphQL uses in the frontend. GraphQL itself is simply a specification, and there are many great tools available for building GraphQL APIs in almost every language. In this post we'll make use of graphql-tools by Apollo to build a simple GraphQL API in JavaScript that queries a Neo4j graph database for movies and movie recommendations. We will follow a recipe approach: first, exploring the problem in more detail, then developing our solution, and finally we discuss our approach. Good resources for learning more about GraphQL are and the Apollo Dev Blog.

Continue reading Building a simple GraphQL server with Neo4j.


Cheryl Platz on designing the Amazon Echo Look



The O'Reilly Design Podcast: Designing in secret, designing for voice, and why improv is an essential design skill.

In this week’s Design Podcast, I sit down with Cheryl Platz, senior designer at Microsoft for the Azure Portal and Marketplaces. We talk about the challenges of working on a top-secret design project, the research behind Amazon's Echo Look, the skills you need to start designing for voice, and how studying improv can make you a better designer.

Continue reading Cheryl Platz on designing the Amazon Echo Look.


Aaron Maxwell on the power of Python



The O’Reilly Programming Podcast: Using Python decorators, generators, and functions.

In this episode of the O’Reilly Programming Podcast, I talk all things Python with Aaron Maxwell, presenter of the live online training courses Python: Beyond The Basics, and Python: The Next Level. He is also the author of the book Powerful Python: The Most Impactful Patterns, Features and Development Strategies Modern Python Provides.

Continue reading Aaron Maxwell on the power of Python.


Introduction to reinforcement learning and OpenAI Gym


A demonstration of basic reinforcement learning problems.Those interested in the world of machine learning are aware of the capabilities of reinforcement-learning-based AI. The past few years have seen many breakthroughs using reinforcement learning (RL). The company DeepMind combined deep learning with reinforcement learning to achieve above-human results on a multitude of Atari games and, in March 2016, defeated Go champion Le Sedol four games to one. Though RL is currently excelling in many game environments, it is a novel way to solve problems that require optimal decisions and efficiency, and will surely play a part in machine intelligence to come. OpenAI was founded in late 2015 as a non-profit with a mission to “build safe artificial general intelligence (AGI) and ensure AGI's benefits are as widely and evenly distributed as possible.” In addition to exploring many issues regarding AGI, one major contribution that OpenAI made to the machine learning world was developing both the Gym and Universe software platforms. Gym is a collection of environments/problems designed for testing and developing reinforcement learning algorithms—it saves the user from having to create complicated environments. Gym is written in Python, and there are multiple environments such as robot simulations or Atari games. There is also an online leaderboard for people to compare results and code. Reinforcement learning, explained simply, is a computational approach where an agent interacts with an environment by taking actions in which it tries to maximize an accumulated reward. Here is a simple graph, which I will be referring to often: Figure 1. Reinforcement Learning: An Introduction 2nd Edition, Richard S. Sutton and Andrew G. Barto, used with permission. An agent in a current state (St) takes an action (At) to which the environment reacts and responds, returning a new state(St+1) and reward (Rt+1) to the agent. Given the updated state and reward, the agent chooses the next action, and the loop repeats until an environment is solved or terminated. OpenAI’s Gym is based upon these fundamentals, so let’s install Gym and see how it relates to this loop. We’ll get started by installing Gym using Python and the Ubuntu terminal. (You can also use Mac following the instructions on Gym’s GitHub.) sudo apt-get install -y python3-numpy python3-dev python3-pip cmake zlib1g-dev libjpeg-dev xvfb libav-tools xorg-dev python-opengl libboost-all-dev libsdl2-dev swig cd ~ git clone cd gym sudo pip3 install -e '.[all]' Next, we can open Python3 in our terminal and import Gym. python3 import gym First, we need an environment. For our first example, we will load the very basic taxi environment. env = gym.make("Taxi-v2") To initialize the environment, we must reset it. env.reset() You will notice that resetting the environment will return an integer. This number will be our initial state. All possible states in this environment are represented by an integer ranging from 0 to 499. We can determine the total number of possible states using the following command: env.observation_space.n If you would like to visualize the current state, type the following: env.render() In this environment the yellow square represents the taxi, the (“|”) represents a wall, the blue letter represents the pick-up location, and the purple letter is the drop-off location. The taxi will turn green when it has a passenger aboard. While we see colors and shapes that represent the environment, the algorithm does not think li[...]

Smart home products need to fit into the experiences and rituals of our everyday lives


Learn how to "domesticate" smart products and understand why it’s essential to design relationships rather than just connectivity.The house from which I wrote this report was built around 100 years ago. It was built by the French in the 1920s to be lent by the Shanghainese municipality to an official of a political party, and then it was given as a sign of respect to a famous opera singer, who decided to consign it to his mistress. It was sold—or more accurately, passed on—in the 1990s and subdivided into smaller apartments to accommodate up to eight families. Now, as a result of Shanghai’s housing boom, it’s rented out to just three families, at 20 times the price for which it was originally lent. It was built when electricity was a luxury. It was later wired for telephone, and then eventually TV cables were installed. I guess there should also be a satellite cable somewhere, but I cannot recognize which plugs are which anymore. When I moved in, fiber-optic was quickly set up by a cable contractor, thrown out in the courtyard to be bundled up in the mess of wires. My home is connected, and it always was in some way. It’s not really owned by anyone, and it’s a very complex mix of old, new, East, West, rich, poor, and so on. When I look around my apartment, I see only a few things I would consider “smart”: my laptop, maybe a couple of other things I managed to coax to work together via Bluetooth, and my dog. The rooms have a complex mix of new things, old things, things I brought with me from previous homes, and things I found here in the neighborhood. There are Italian lamps, Chinese unbranded appliances, and various devices that were manufactured for the American market (but produced in China). There are a few handmade objects I acquired for the love of craft as well as a lot of cheaply mass-produced items I bought due to their low prices. There are things I use, things I forgot I had, things that were given to me, and things I bought by mistake (as I recently discovered Taobao Marketplace, which is Alibaba’s on-steroids answer to Amazon ecommerce platforms, the latter one boomed). A home is not a “house”; a home is not only a set of problems that can be solved or tasks that can be automated. A home, as said by Joseph Grima, founder of the architecture and research studio Space Caviar, “is so much more than the sum of the functions it performs,” and it’s a very complex mix of people, architecture, history, memories, technology, and life. My home—or, more precisely, my apartment—answers as much to my functional needs as it functions as a representation of my own aspirations, or, most likely, my laziness. The more I realize looking around, the less I would have imagined that 2017 would look like this. As a designer working in and with technology daily, I guess my home is the least “smart” that it can be, and it made me wonder, “Why?” Why am I so excited to design for the near future in which smartness will leak into our daily lives, while at the same time not allowing it into my own space? Am I just living the symptoms of my own version of a recurring analog dream? Or, maybe I just don’t see the right kind of “smartness” that I want or need? “Smart” assumptions Smartness has been pushed as a term to represent the ongoing aspiration toward a more controlled and more “ecologically viable solution” of today’s environments and devices. Smart cities, smart homes, and smart devices are being pushed in our lives to help us deal with our own limi[...]

Four short links: 13 July 2017


Conversational Data Science, L3 Autonomy, Human Computation, and Embedded Learning

  1. Iris: A Conversational Agent for Data Science -- a cross between R Notebook and Facebook Messenger. See also this description of the project and what they hope to achieve.
  2. Audi A8: First to Reach Level 3 Autonomy -- for those of you not up with your autonomous driving levels, the A8 features the “AI traffic jam pilot,” meaning the car can take control of the driving in slow-moving traffic at up to 60 kilometers per hour. The system is activated by a button on the center console, and it can take over acceleration, braking, steering, and starting from a dead-stop, all without the driver paying attention.
  3. The Complexity of Human Computation: A Concrete Model with Application to Passwords -- The intent of this paper is to apply the ideas and methods of theoretical computer science to better understand what humans can compute in their heads. For example, can a person compute a function in their head so that an eavesdropper with a powerful computer—who sees the responses to random inputs—still cannot infer responses to new inputs?
  4. ELL -- Microsoft's Embedded Learning Library, which allows you to build and deploy machine-learned pipelines onto embedded platforms, like Raspberry Pis, Arduinos, micro:bits, and other microcontrollers. The deployed machine learning model runs on the device, disconnected from the cloud. Our APIs can be used either from C++ or Python.

Continue reading Four short links: 13 July 2017.


Transforming text data in Java



Assign text snippets to a corresponding collection of vectors.

Data Operations

Now that we know how to input data into a useful data structure, we can operate on that data by using what we know about statistics and linear algebra. There are many operations we perform on data before we subject it to a learning algorithm. Often called preprocessing, this step comprises data cleaning, regularizing or scaling the data, reducing the data to a smaller size, encoding text values to numerical values, and splitting the data into parts for model training and testing. Often our data is already in one form or another (e.g., List or double[][]), and the learning routines we will use may take either or both of those formats. Additionally, a learning algorithm may need to know whether the labels are binary or multiclass or even encoded in some other way such as text. We need to account for this and prepare the data before it goes in the learning algorithm. The steps in this chapter can be part of an automated pipeline that takes raw data from the source and prepares it for either learning or prediction algorithms.

Many learning and prediction algorithms require numerical input. One of the simplest ways to achieve this is by creating a vector space model in which we define a vector of known length and then assign a collection of text snippets (or even words) to a corresponding collection of vectors. The general process of converting text to vectors has many options and variations. Here we will assume that there exists a large body of text (corpus) that can be divided into sentences or lines (documents) that can in turn be divided into words (tokens). Note that the definitions of corpus, document, and token are user-definable.

Continue reading Transforming text data in Java.


What if we build the internet we always wanted?


It's time to stop cursing the network we have and build the network we want.On July 12, users of the internet united in the Battle for the Net, a protest against the FCC's plans to abandon the principle of network neutrality. O'Reilly joined a diverse group of organizations to support the cause. And it's not over: Submit your own comment by July 17 to make sure your voice is heard. We don't believe the internet is perfect—not by a long shot. Remember when the internet was going to usher in an age of peace and understanding because humans would be able to communicate with each other? It didn't happen. The sludge of fake news is, in many corners of the internet, drowning out real news. Minorities and women who speak up on the internet can only do so if they're willing to accept a flood of hate-speak. Earlier this year, Congress (unsurprisingly) gave our ISP monopolies the right to sell us to the highest bidder. Now, the FCC is doing away with network neutrality, which allowed creativity and freedom of expression, the best of the internet, to prosper. Network neutrality means that ISPs like Comcast and Verizon can't privilege one form of traffic over another: text must be treated the same as voice, the same as video, the same as whatever we invent in the future. (Virtual reality? Live high-bandwidth medical imaging?) They can't privilege one form of traffic for any reason—whether it's "we offer a video service, so we'll limit our competition's bandwidth," "we would rather our customers consume the news on our feed," or simply "your new service for streaming medical data looks like a success, and we'd like to give bigger bonuses to our executives." Over the years, I've watched networks built with the best of intentions be co-opted: by trolls, by advertisers, by spammers, by malware makers, and now by the network providers themselves. The damage has "always already begun": trolling certainly existed on the ancient Usenet, as well as the original BBSs. And while it's tempting to say this is a new crisis, it certainly isn't: remember when every newsgroup on Usenet was flooded by porn spam? Those were not the days. At the same time, our networks have always been the breeding ground for free expression, from the silly to the wonderful. So, we're not facing a new crisis; we're facing the same old crisis, the crisis we had back in the '80s, the crisis we pretended didn't exist in the '90s, the crisis we weren't interested in addressing in the '00s, and so on. And I'm tired of it. I won't say "I want my internet back," because that's a myth of an innocent past that was never all that innocent. But it's high time to build the internet that we wanted all along: a network designed to respect privacy, a network designed to be secure, and a network designed to impose reasonable controls on behavior. And a network with few barriers to entry—in particular, the certainty of ISP extortion as new services pay to get into the "fast lane." Is it time to start over from scratch, with new protocols that were designed with security, privacy, and maybe even accountability in mind? Is it time to pull the plug on the abusive old internet, with its entrenched monopolistic carriers, its pervasive advertising, and its spam? Could we start over again? That would be painful, but not impossible. Painful, but perhaps necessary as the FCC gives the entrenched carrier monopolies more power to control, monitor, and sell what you see. Painful, but perh[...]

Perform sentiment analysis with LSTMs, using TensorFlow



Explore a highly effective deep learning approach to sentiment analysis using TensorFlow and LSTM networks.

Explore a highly effective deep learning approach to sentiment analysis using TensorFlow and LSTM networks.

Adit Deshpande is an expert in performing sentiment analysis using TensorFlow in conjunction with a deep learning model called an LSTM (Long Short Term Memory) network. In this Oriole, he takes you through a complete LSTM based sentiment analysis process. He explains why LSTMs are useful for capturing long term dependencies in natural language text and then walks you through a code example for performing sentiment analysis (using pre-trained word vectors), where he demonstrates the application of word vectors, the creation of the network architecture, as well as the training and testing with Tensorflow. He concludes by examining a pre-trained sentiment analysis model and summarizing the results.

What you will learn:

  • Gain hands-on experience performing sentiment analysis using TensorFlow and LSTMs
  • Review the purpose of word vectors and Word2Vec in relation to natural language processing (NLP)
  • Understand the differences between a feedforward neural network and a recurrent neural network (RNN)
  • Learn how to use the TensorFlow API to perform sentiment analysis on a natural language dataset
  • Understand how to define recurrent neural network (RNN) and LSTM graphs using TensorFlow
  • Develop the ability to solve a range of NLP tasks (like sentiment analysis) using RNNs and LSTMs

Continue reading Perform sentiment analysis with LSTMs, using TensorFlow.


Join the battle for the internet



It's time to rally in defense of the internet again.

On July 12, O'Reilly Media joins the Battle for the Net in defense of network neutrality, one of the fundamental principles of the internet. Network neutrality prevents internet providers such as Comcast and Verizon from playing favorites: for example, from slowing down Netflix because they offer a competing video service. They're required to treat all traffic equally.

We encourage everyone to tell their representatives in Congress, and the FCC, how important this is. We expect the FCC's server to be overwhelmed by the response, as it has been in the past. A more reliable way to submit your comment is to use the Battle for the Net site, which will log your comments and ensure they are submitted. The deadline for comments is July 17, 2017.

This isn't the first time the internet has faced a challenge. We took on the repressive "Stop Online Piracy Act" (SOPA) in 2012 and won. We defended network neutrality in 2014 with the Internet Slowdown Day and won. It's time to rally in defense of the internet again.

Continue reading Join the battle for the internet.


Implementing the quicksort algorithm


It’s pretty easy to grasp the concept, but it’s a tricky algorithm to implement. 100 Days of Algorithms is a series of Medium posts and Jupyter Notebooks by Tomáš Bouda that implement 100 interesting algorithms. They're a programming exercise that Bouda set for himself: can he implement 100 interesting algorithms, one per day? The answer was “yes.” The algorithms range from classics like Towers of Hanoi to Bloom filters and graph traversal. Over the coming weeks, we’ll be featuring selections from Bouda's 100 Days of Algorithms project here on O’Reilly. Day 57, Quicksort Quicksort is the sorting algorithm used in almost every programming library. It's very fast—probably the fastest general purpose sort out there. As Bouda points out in his post on Medium, it's more than a bit scary: getting it right is tricky, and there are a lot of unexpected worst cases. You'll probably never need to implement this, but it's worth seeing how it's done. Here's Bouda's Medium post, and you can access and clone the Jupyter Notebook here. import numpy as np algorithm def swap(data, i, j): data[i], data[j] = data[j], data[i] def qsort3(data, left, right): # sorted if left >= right: return # select pivot i = np.random.randint(left, right + 1) swap(data, left, i) pivot = data[left] # i ~ points behind left partition # j ~ points ahead of right partition # k ~ current element i, j, k = left, right, left + 1 # split to [left] + [pivot] + [right] while k <= j: if data[k] < pivot: swap(data, i, k) i += 1 elif data[k] > pivot: swap(data, j, k) j -= 1 k -= 1 k += 1 # recursion qsort3(data, left, i - 1) qsort3(data, j + 1, right) def qsort(data): qsort3(data, 0, len(data) - 1) run data = np.random.randint(0, 10, 100) print(data) [6 1 7 9 9 9 9 8 7 6 9 9 6 3 5 4 1 8 1 7 0 1 9 3 1 0 3 2 4 3 1 7 6 0 2 7 0 7 9 1 0 4 9 2 3 4 5 9 5 8 9 1 8 2 0 5 4 9 5 3 1 0 1 1 2 3 8 1 4 2 2 4 7 9 3 0 0 4 9 3 0 7 0 8 5 8 3 5 9 6 7 6 5 9 3 4 0 1 0 7] qsort(data) print(data) [0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3 3 4 4 4 4 4 4 4 4 4 5 5 5 5 5 5 5 5 6 6 6 6 6 6 7 7 7 7 7 7 7 7 7 7 8 8 8 8 8 8 8 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9] Technical notes The implementations work; the Jupyter Notebooks all run. Since this started off as a personal exercise, don't expect the implementations to be optimal, bullet-proof, or even necessarily correct (though we don't see anything wrong with them). And don't expect them to contain your favorite algorithms (or the ones you need for your homework assignments). The easiest way to install Jupyter Notebooks is to use Anaconda. The second easiest (and most bulletproof) way is to install Docker and then use the scipy-notebook container. If you're rolling your own Jupyter environment, you need: Python 3.5 (a few of the “days” require 3.6; most will work with 3.4) Jupyter Matplotlib NumPy SciPy Bokeh NetworkX Continue reading Implementing the quicksort algorithm.[...]

Building data-informed products



Stewart Rogers on building and managing products with embedded analytics.

Managers have embraced the idea that data can drive value in any industry, but that assumes the right people within their organizations have access to data. Dashboards, report builders, and embedded visualization interfaces have become standard tools for bringing data and analytics to a wide range of data users.

My guest in this podcast episode is Stewart Rogers, director of product management at Lambda Solutions, which offers an open source e-learning management system; its customers are organizations that deliver online learning and need to understand how their students are progressing. Rogers is thus doubly dependent on embedded analytics in his products: he uses analytics to manage his products, measuring customer response to new features, but he also needs to pass meaningful analytics through to his customers in order to let them assess their learners and their content.

src="" height="166" width="100%" frameborder="no" scrolling="no">

“We’re about two years into our analytics journey,” says Rogers. “The old model was very much just pure report-based: ‘Here’s an opportunity to export some of your data.’ We’ve built it out using Jaspersoft to give [customers] richer, different visual insights that they weren’t able to get before—charts, dashboards, and embedded things.”

The richer reports have been a key selling point for Lambda, but data-savvy customers who have never used a learning management system before “almost expect it to work the way we present it now,” says Rogers. For them, embedded reporting and visualization is essential, he says: “they may have very specific requirements: ‘I need this field labeled this way and sorted this way,’” with highly specific distribution and access requirements as well.

Asked what his most sophisticated users are doing with his platform, and what that might suggest for the future, Rogers points to prediction. “A lot of our customers start with administration-type reports: who’s done what, when, that kind of thing,” he says. “Now they’re going to look more deeply at trends and potential outcomes and try to use say ‘if A, B, and C, then this person is likely to struggle.’”

This post and podcast is a collaboration between O’Reilly and TIBCO Jaspersoft. See our statement of editorial independence.

Continue reading Building data-informed products.


Prototyping with Figma



How to build apps, prototypes, and websites that you can collaborate on with others.

Continue reading Prototyping with Figma.


Serverless on Google with Cloud Functions and React



Combining serverless architecture with a React frontend means you can spin up applications with minimal administrative overhead.

Cloud Functions is Google's entry into the serverless computing craze. The promise of serverless architecture is that it dramatically reduces the cost and complexity of developing, scaling, and operating a range of applications. While there are still a host of issues to resolve, the potential benefits are enormous, especially for microservices, IoT, and mobile.

This article and companion github repo show how to use Cloud Functions as the backend for a React app that can be deployed as a static website. While frameworks like Serverless provide more features, such as built-in deployment and a variety of useful plugins, the approach presented here is a simple and quick way to get a basic frontend and backend up and running quickly.

Continue reading Serverless on Google with Cloud Functions and React.


Four short links: 12 July 2017


Net Neutrality, Scaling with Erlang, Matrix Cookbook, API Security

  1. Internet-Wide Day of Action to Save Net Neutrality -- light up the internet to fight the slashing of net neutrality laws; O'Reilly is part of it.
  2. How Discord Scaled Elixir -- the endless search for hot paths and bottlenecks that is scale.
  3. The Matrix Cookbook -- a collection of facts (identities, approximations, inequalities, relations, ...) about matrices and matters relating to them. It is collected in this form for the convenience of anyone who wants a quick desktop reference.
  4. API Security Checklist for Developers -- Checklist of the most important security countermeasures when designing, testing, and releasing your API.

Continue reading Four short links: 12 July 2017.


Four short links: 11 July 2017


Brain Residency, Future of Work, Customer Feedback, and Verifying Distributed Systems

  1. The Google Brain Residency -- summary of outputs, with pointers to papers and GitHub repos, from the Google program that helps individuals from diverse educational backgrounds and experiences to dive into research in machine learning and deep learning.
  2. Findings of The Shift Commission on Work, Workers, and Technology -- their exploration of four scenarios. Rock-Paper-Scissors Economy: Less work, mostly tasks. Jump Rope Economy: More work, mostly tasks. King of the Castle Economy: Less work, mostly jobs. Go Economy: More work, mostly jobs. I like the two axis scenario framework: less/more work; jobs or tasks.
  3. Spotlight Feedback Framework -- depending on the questions your customers ask, you can categorize your problem as user experience ("how do I X?", "what happens when X?", "I tried to X"); product marketing ("can you/I X?", "how do you compare to X?", "how are you different than X?", "why should I use you for X?"); and positioning ("I'm probably not your target customer...", "I'm sure I'm wrong, but I thought..."). That distinction between "can I..." and "how do I..." is subtle; the former means you've not shared that your product can do the thing, while the latter means you've not made the method discoverable without explanation. (via Product Habits)
  4. Verdi -- Verdi is a framework from the University of Washington to implement and formally verify distributed systems. Verdi supports several different fault models, ranging from idealistic to realistic. Interesting on two fronts: proving system correctness may lead to useful tools and software development systems, and work on this to date has largely been around single-threaded programs. Anyone who has debugged multi-threaded code knows how much more difficult it is to reason about such systems.

Continue reading Four short links: 11 July 2017.


The SMACK stack


A new architecture for today’s data-rich modern applications.Years ago, the forward thinkers in software engineering predicted a time when building applications would become less complex. It’s safe to say that future is still a distant mirage—if anything, complexity has only increased. Where we used to build applications for more efficient business processes, organizations now rely on software just to stay in business. Terms like “digital transformation” and “customer experience” drive the need for more data-driven applications simply to stay competitive and relevant. The explosion of tools and infrastructure products hasn’t made it easier for builders trying to make it work. However, a cluster of products has risen to be the dominant back-end stack when building data-driven applications. Thankfully, this stack has an easy acronym to remember: SMACK. The SMACK name represents the individual parts of the collection: Spark, Mesos, Akka, Cassandra, and Kafka. Each has a separate job that is unique from the others, but in combination, give you a well-rounded back-end infrastructure that holds up to today’s most demanding workloads. Each component is built on distributed system methodologies and each scales horizontally when needed. To tackle large-scale data processing, you need to break down the responsibilities into three main areas: collect, process, and store. Collecting data at high speed is enough of a challenge, but you need order in the potentially chaotic data stream. Apache Kafka was purpose-built to decouple data pipelines and organize streams of data. Kafka guarantees that data from producers is seen at least once, in the order it was received, as a back-end consumer process. Using topic-based queues, you can further organize your data as it is collected. It wasn’t long ago when just batch processing your large volumes of data was enough analytics. Today’s competitive landscape requires that you use your data immediately, or fall behind. It requires a combination of processing techniques, such as discrete data points using Akka, stream micro-batch with Spark Streaming, and large-scale batch with Apache Spark. This makes up the processing layer we use at DataStax, with near real-time data processing in order to create immediate context and batch jobs to combine and enrich historical data. To keep up with the volumes and velocity of data, you need a database designed to scale when you need it: that is where we rely on Apache Cassandra for storage. Conceived as a cloud-first database, it was designed around the workloads required with data-rich applications in mind. Just like the other parts of the SMACK stack, Cassandra scales by simply adding more nodes, which means it’s ready when you need it. More importantly, the always-on, multi-datacenter (and multi-cloud) architecture means that your most important asset—data—is protected. The tight integration with Apache Spark and Akka combine processing and storage for new and old data. Finally, with all that horizontally scaling infrastructure, you need to have it under control. That’s where Apache Mesos steps in to save the day. With the rapidly changing workloads, resource contention can be [...]

Four short links: 10 July 2017


Multics Emulated, Physics Explained, DRM Vilified, and Lisp in Convenient Notebook Form​

  1. Running Multics on the DPS8M Emulator -- how-to guide that'll let you experience 1965 tech. I love that there's a new release of Multics (fore-runner of Unix) decades after the last hardware capable of running it died. (via Slashdot)
  2. The Mechanical Universe (YouTube) -- a critically acclaimed series of 52 30-minute videos covering the basic topics of an introductory university physics course. So, like a MOOC or Khan Academy, but from 1985. (via Caltech)
  3. DRM is Toxic to Culture -- can't be said enough. Technology-enforced restrictions quantize and prejudge discretion.
  4. Darkmatter -- The notebook-style Common Lisp environment​. ​

Continue reading Four short links: 10 July 2017.


Four short links: 7 July 2017


Nanotube Computing, Civil Service Code, Ghastly Management, and Eminence-Induced Warpage

  1. Carbon Nanotube Computing -- The new 3D architecture is based on novel devices including two million carbon nanotube transistors and over one million resistive RAM cells, all built on top of a layer of silicon using existing fabrication methods and connected by densely packed metal wiring between the layers. As a demonstration, the team built an electronic nose that can sense and identify several common vapors, including lemon juice, rubbing alcohol, vodka, wine, and beer.
  2. Why I'm Leaving 18F -- 18F (and the larger service we created for it and its sibling organizations, the Technology Transformation Service), is being reorganized via administrative order into the General Services Administration’s (GSA) Federal Acquisition Service. [...] We were subsequently told that the new Commissioner of the Federal Acquisition Service would suddenly and immediately become a political position, with a person appointed directly by the White House. In a single day, the White House took direct control of two of the most important shared service organizations in government.
  3. Antisocial Coding: My Year at GitHub -- For my first few pull requests, I was getting feedback from literally dozens of engineers (all of whom were male) on other teams, nitpicking the code I had written. One PR actually had over 200 comments from 24 different individuals. It got to the point where the VP of engineering had to intervene to get people to back off. From there to an HR-orchestrated firing, in a year (via a pile of undercutting nonsense). Grim, and a textbook lesson in how not to treat your employees and coworkers. Our industry needs an enema.
  4. Our Obsession With Eminence Warps Research (Nature) -- cf "10x engineers."

Continue reading Four short links: 7 July 2017.


Getting started with classes in Kotlin



Learn about public and private properties and how to work with mutable data and nullable types in Kotlin classes.

Continue reading Getting started with classes in Kotlin.


A framework for building and evaluating data products



The O’Reilly Data Show Podcast: Pinterest data scientist Grace Huang on lessons learned in the course of machine learning product launches.

In this episode of the Data Show, I spoke with Grace Huang, data science lead at Pinterest. With its combination of a large social graph, enthusiastic users, and multimedia data, I’ve long regarded Pinterest as a fascinating lab for data science. Huang described the challenge of building a sustainable content ecosystem and shared lessons from the front lines of machine learning product launches. We also discussed recommenders, the emergence of deep learning as a technique used within Pinterest, and the role of data science within the company.

Continue reading A framework for building and evaluating data products.


Four short links: 6 July 2017


Mobile Testing, Semantic Segmentation, Salary Gossip, and Word Analogies

  1. OWASP Mobile Testing Guide -- a work in progress, but useful.
  2. Semantic Segmentation Using Deep Learning -- review of the techniques that help you categorize pixels in an image by which object in the image they belong to.
  3. Salary Gossip -- Top 6% or so of engineers at Amazon, Oracle, Google, Facebook, Twitter are paid more than $1.3 million per year. Next 11% make $650,000 on average. [...] $600,000 to $2 million packages are similarly becoming common in the U.S. Software engineers with 10 years experience should be making ~$420,000 per year with ~$210,000 salary. [...] $240,000 to $470,000 packages are now common in China. [...] Fresh MS graduates in the U.S. are getting $220,000+ packages at Google, Amazon, eBay, Twitter, LinkedIn, Airbnb, Facebook, Snapchat, will add more. No idea of the source, so take it all with a pillar of salt, but ... holy crap. As a Kiwi, I approve of Do not go to Singapore, Germany, or U.K. Go to Canada, Australia, France, New Zealand, or South Africa instead.
  4. Word Vectors and SAT Analogies -- clever approach using the SAT analogy questions to test how well the word vector technique ("king - man + woman = queen") holds up to relatively real-life situations. He got 49% accuracy but notes: These methods are not the best-performing non-human technique for these SAT analogy questions. Littman and Turney report several. Latent Relational Analysis comes in at 56% accuracy, against the average U.S. college applicant at 57%.

Continue reading Four short links: 6 July 2017.


How can I have different headers and footers on pages in my Microsoft Word 2016 document?



Watch a demonstration that shows you how Microsoft Word 2016 lets you vary the footers and headers on different pages in your Word documents.

Continue reading How can I have different headers and footers on pages in my Microsoft Word 2016 document?.


How can I use the formula command to calculate a total in a Microsoft Word 2016 table?



Learn about the math operations a Microsoft Word 2016 table is capable of without having to use Excel.

Continue reading How can I use the formula command to calculate a total in a Microsoft Word 2016 table?.


How do I use Microsoft Word 2016 to create address labels from an Excel file?



Watch how Microsoft Word 2016 pulls contact information from Excel and uses the data to print a set of mailing labels.

Continue reading How do I use Microsoft Word 2016 to create address labels from an Excel file?.


Alex Pinto on the intersection of threat hunting and automation



The O’Reilly Security Podcast: Threat hunting’s role in improving security posture, measuring threat hunting success, and the potential for automating threat hunting for the sake of efficiency and consistency.

In this episode, I talk with Alex Pinto, chief data scientist at Niddel. We discuss the role of threat hunting in security, the necessity for well-defined process and documentation in threat hunting and other activities, and the potential for automating threat hunting using supervised machine learning.

Continue reading Alex Pinto on the intersection of threat hunting and automation.


From binoculars to big data: Citizen scientists use emerging technology in the wild


Colin Kingen, software engineer for Wildbook, explains the technology driving data capture and wildlife research.For years, citizen scientists have trekked through local fields, rivers, and forests to observe, measure, and report on species and habitats with notebooks, binoculars, butterfly nets, and cameras in hand. It’s a slow process, and the gathered data isn’t easily shared. It’s a system that has worked to some degree, but one that’s in need of a technology and methodology overhaul. Thanks to the team behind and their Wildbook software, both citizen and professional scientists are becoming active participants in using AI, computer vision, and big data. Wildbook is working to transform the data collection process, and citizen scientists who use the software have more transparency into conservation research and the impact it’s making. As a result, engagement levels have increased; scientists can more easily share their work; and, most important, endangered species like the whale shark benefit. In this interview, Colin Kingen, a software engineer for WildBook, (with assistance from his colleagues Jason Holmberg and Jon Van Oast) discusses Wildbook’s work, explains classic problems in field observation science, and shares how Wildbook is working to solve some of the big problems that have plagued wildlife research. He also addresses something I’ve wondered about: why isn’t there an “uberdatabase” to share the work of scientists across all global efforts? The work Kingen and his team are doing exemplifies what can be accomplished when computer scientists with big hearts apply their talents to saving wildlife. Imagine looking through the same 5,000 images every time you get a new one, and looking closely enough to identify a matching pattern of spots in seven of them so you can tag an image as a certain animal. One of the exciting aspects of your work is your mission, which focuses on putting technology into the hands of citizen scientists to collect data on wildlife. What are some of the challenges and opportunities that inspired the creation of Wildbook? Wildlife biology is a field observation science that relies heavily on a technique called “mark-recapture,” in which animals in a population are individually marked (e.g., ear tags on deer or leg bands on birds) and their presences and absences are recorded manually by observers. On-site research teams are generally poorly funded and must focus limited resources on narrow windows of observation; the small resulting data sets run the risk of reflecting project limitations rather than species behavior. Arriving at a critical mass of data for population analysis (especially for rare or endangered species) can take years for small teams of researchers. Long required observation periods and manual data processing (e.g., matching photos “by eye”) can create multi-year lags between study initialization and scientific results[...]

Four short links: 5 July 2017


Network Mathematics, Learning Math, Open Source Q&A, and Australian Privacy

  1. Network Mathematics and Rival Factions -- understanding how "the enemy of my enemy" plays out. (via Steven Strogatz)
  2. A Path Less Taken to the Peak of the Math World -- the ultimate in "fake it until you make it." Huh tried to use these lunches to ask Hironaka questions about himself, but the conversation kept coming back to math. When it did, Huh tried not to give away how little he knew. “Somehow I was very good at pretending to understand what he was saying,” Huh said. Indeed, Hironaka doesn’t remember ever being aware of his would-be pupil’s lack of formal training. “It’s not anything I have a strong memory of. He was quite impressive to me,” he said.
  3. Gitter Open Sourced -- source to GitLab's Q&A site about open source software. Useful, I guess, if you want to run your own Q&A site.
  4. Medicare Details of Every Australian Currently Up for Sale on the Dark Web -- this is fine.

Continue reading Four short links: 5 July 2017.


Four short links: 4 July 2017


Retro Fonts, AI Research, Mind Reading, and Prototyping Tool

  1. Old-School PC Fonts -- I code better when my font is retro. My setup: Print Char 21 font, green on black. (via Hervé Piton)
  2. Measuring the Progress of AI Research (EFF) -- This pilot project collects problems and metrics/data sets from the AI research literature, and tracks progress on them. Some astonishing exponential growth with even more astonishing x-axes of time measured in months. (via Chris Dixon)
  3. Mind Reading Comes a Step Closer -- The model was able to predict the features of the left-out sentence with 87% accuracy, despite never being exposed to its activation before. It was also able to work in the other direction: to predict the activation pattern of a previously unseen sentence, knowing only its semantic features.
  4. Pencil -- tool for making diagrams and GUI prototyping that everyone can use. Free and open source.

Continue reading Four short links: 4 July 2017.


Four short links: 3 July 2017


Robot Academy, Wikipedia Adventure, PR Science, and Complexity and Planning

  1. Robot Academy -- University level, short video lessons and full online courses to help you understand and prepare for this technology of the future. From the Queensland University of Technology.
  2. Wikipedia: The Text Adventure -- this is so clever!
  3. An Adversarial Review of Adversarial Generation of Natural Language - while I agree that short publication cycles on arXiv can be better than the lengthy peer-review process we now have, there is also a rising trend of people using arXiv for flag-planting, and to circumvent the peer-review process. This is especially true for work coming from “strong” groups. Currently, there is practically no downside of posting your (often very preliminary, often incomplete) work to arXiv, only potential benefits.
  4. The Critical Difference Between Complex and Complicated -- In a complex environment, it is truly rare that a grand plan or strategy will work as intended. (via Stuart Candy)

Continue reading Four short links: 3 July 2017.


The mission of spreading the knowledge of innovators continues



Additional context on why we’re no longer selling books and videos on

Yesterday, we announced that O'Reilly is no longer selling books and videos on We heard from some of you that you're unhappy about that decision, especially because no other sellers offer DRM-free ebooks in multiple digital formats. You're right about that, but there's more to the story. And since we’ve always been transparent with our customers, here's some additional context about why we made those recent changes.

O’Reilly has always been a privately held, self-funded company, and it’s a distinction we wear with pride. We don’t have any investors but our customers, who fund us by buying our products and services. That keeps us attuned to what the market is really telling us. But it also means we have to make decisions as we grow and change while living within our means—decisions about investments, about markets, and about our customers and employees.

Continue reading The mission of spreading the knowledge of innovators continues.


Four short links: 30 June 2017


Jumping Robot, Exactly Once, Pocket Negotiator, and Science-Based Games

  1. Salto, the Jumping Robot (IEEE Spectrum) -- two little thrusters are able to control Salto-1P’s yaw and roll: When they’re thrusting in different directions, the robot yaws, and when they both thrust in the same direction, the robot rolls. Combined with the tail, that means Salto-1P (which only ways 98 grams) can stabilize and control itself in three dimensions, even in mid-air, which is what allows it to chain together so many jumps.
  2. Delivering Billions of Messages Exactly Once -- using Kafka and RocksDB to build a "two-phase architecture" to give the commit and rollback needed.
  3. Pocket Negotiator -- TU Delft's software will collaborate with you in your upcoming negotiation to make it a pleasent experience ending in a good deal. The Pocket Negotiator can be used for preparation (training session), to support during actual negotiations and for mediating support.
  4. Science-Based Games List --a collaborative notepad with educational/science games—i.e., games that are: capturing parts of real scientific phenomena (including social science, medicine, etc.), and actually playable (you can play them for fun, not ones "for classroom only").

Continue reading Four short links: 30 June 2017.


We're reinventing, too



We were investing in the future when we launched Safari back in 2001. Today, that future is here.

There’s been a lot written about the need for businesses to re-invent themselves. We’ve done our share of that writing. And it’s true: markets and competition are changing faster than ever, and a business that isn’t constantly trying new approaches and engaging with new ideas is bound to fail. That’s clear even among the largest, most stable corporations: a few years ago, a study predicted that 40% of today’s Fortune 500 companies won’t exist in a decade.

This week, O’Reilly Media stopped retailing books directly on our ecommerce store. You might say “what!?” Or you might say “what’s the big deal?” Before I explain our business strategy here, there are two important things to note:

Continue reading We're reinventing, too.


Evolve AI



Naveen Rao explains how Intel Nervana is evolving the AI stack from silicon to the cloud.

Continue reading Evolve AI.


Magenta: Machine learning and creativity



Doug Eck discusses Magenta, a Google Brain project aimed at developing new machine learning models for art and sound creation.

Continue reading Magenta: Machine learning and creativity.


Cars that coordinate with people



Anca Dragan introduces a mathematical formulation that accounts for cars responding to people and people responding to cars.

Continue reading Cars that coordinate with people.


Artificial intelligence in the software engineering workflow



The workflow of the AI researcher has been quite different from the workflow of the software developer. Peter Norvig explores how the two can come together.

Continue reading Artificial intelligence in the software engineering workflow.


Machine learning on Google Cloud Platform



Amy Unruh demonstrates Google Cloud machine learning APIs and highlights OSS TensorFlow models.

Continue reading Machine learning on Google Cloud Platform .


Superhuman AI for strategic reasoning: Beating top pros in heads-up no-limit Texas Hold’em



Tuomas Sandholm explains how domain-independent algorithms are being applied to a variety of imperfect-information games, like poker.

Continue reading Superhuman AI for strategic reasoning: Beating top pros in heads-up no-limit Texas Hold’em.


Building a next-generation platform for deep learning



The O’Reilly Data Show Podcast: Naveen Rao on emerging hardware and software infrastructure for AI.

In this episode of the Data Show, I speak with Naveen Rao, VP and GM of the Artificial Intelligence Products Group at Intel. In an earlier episode, we learned that scaling current deep learning models requires innovations in both software and hardware. Through his startup Nervana (since acquired by Intel), Rao has been at the forefront of building a next generation platform for deep learning and AI.

I wanted to get his thoughts on what the future infrastructure for machine learning would look like. At least for now, we’re seeing a variety of approaches, and many companies are using heterogeneous processors (even specialized ones) and proprietary interconnects for deep learning. Nvidia and Intel Nervana are set to release processors that excel at both training and inference, but as Rao pointed out, at large-scale there are many considerations—including utilization, power consumption, and convenience—that come into play.

Continue reading Building a next-generation platform for deep learning.


Assumptions have a powerful effect on a product’s outcome


How the Hypothesis Progression Framework and Customer-Driven Cadence can help mitigate assumptions and guide you through customer and product development.In the summer of 2000, General Motors, an American car manufacturer, introduced the Pontiac Aztek, a radically new “crossover” vehicle—part sedan, part minivan, and part sports utility vehicle (see Figure 1). It was marketed as the do-it-all vehicle for 30-somethings. It was the car for people who enjoyed the outdoors, people with an “active lifestyle” and “none to one child.” On paper, the Aztek appeared to be fully featured. It had a myriad of upgrades that included options for bike racks, a tent with an inflatable mattress, and an onboard air compressor. GM even included an option for an insulated cooler, to store beverages and cold items, between the passenger and driver seat. Their ideal customer was someone who would use the Aztek for everything from picking up groceries to camping out in the wilderness. Figure 1. 2001–2005 Pontiac Aztek. The Aztek had a polarizing visual aesthetic: many either loved or hated it (most hated it). Critics found its features, like the optional tent and cooler, awkward and downright gimmicky. GM insisted these were revolutionary ideas and suggested they were ahead of their time. They believed that, once customers took the Aztek for a test drive, they would quickly realize just what they were missing. After a $30 million marketing push, it appeared the critics were right. The Aztek failed to make even a modest dent in the overall market. The year the Aztek was released, the American auto industry sold 17.4 million vehicles. The Aztek represented only 11,000 of those vehicles (a number that some believed was still generously padded). To customers, the Aztek seemed to get in its own way. It was pushing an agenda by trying to convince customers how they should use their vehicles, rather than responding to how they wanted to use them. It’s easy to point at this example in hindsight and ask, “How could GM spend so much time, money, and resources only to produce a car no one wanted?” Some suggested it was because the car was “designed by committee” or that it was a good idea with poor execution. Insiders blamed the “penny pinchers” for insisting on cost-saving measures that ultimately produced a hampered product that wasn’t at all consistent with the original vision. The lead designer of the Aztek, Tom Peters, went on to create many successful designs, like the C6 Chevy Corvette and 2014 Camaro Z/28, and eventual[...]

Sam Newman on moving from monolith systems to microservices



The O’Reilly Programming Podcast: Principles for the successful adoption of microservices.

In this episode of the O’Reilly Programming Podcast, I talk about microservices with Sam Newman, presenter of the O’Reilly video course The Principles of Microservices and the online training course From Monolith to Microservices. He is also the author of the book Building Microservices: Designing Fine-Grained Systems.

Continue reading Sam Newman on moving from monolith systems to microservices.


Four short links: 29 June 2017


Bug Chasing, Brain Videos, Collaborative Annotation, and TLD Scam

  1. A Bug Detective Story -- spoiler: the CPU did it.
  2. How to Capture Videos of Brains in Real Time -- The scientists first engineered the animals’ neurons to fluoresce (glow), using a method called optogenetics. The stronger the neural signal, the brighter the cells shine. To capture this activity, they used a technique known as “light-field microscopy,” in which an array of lenses generates views from a variety of perspectives. These images are then combined to create a three-dimensional rendering, using a new algorithm called “seeded iterative demixing” (SID) developed by the team.
  3. Lacuna -- open source collaborative annotation for tertiary classes, built at Stanford.
  4. The .feedback Scam -- wow. An entire top-level domain dedicated to the old "someone is talking about you, sign up to find out what they're saying" scam.

Continue reading Four short links: 29 June 2017.


How do I add a custom image to the Alexa skill displayed on the companion app?



Learn to make images that display on a user's smartphone or tablet as your Amazon Alexa skills activate.

Continue reading How do I add a custom image to the Alexa skill displayed on the companion app?.


How do I add sounds and music to my custom Amazon Alexa skill?



Learn how to add the sounds and music clips that can make your Amazon skill stand out in the already crowded Amazon skills marketplace.

Continue reading How do I add sounds and music to my custom Amazon Alexa skill?.