Subscribe: O'Reilly Radar - Insight, analysis, and research about emerging technologies
http://radar.oreilly.com/feed
Added By: Feedage Forager Feedage Grade B rated
Language: English
Tags:
continue reading  continue  data  design  learning  links february  links  open source  reading short  reading  short links  short 
Rate this Feed
Rate this feedRate this feedRate this feedRate this feedRate this feed
Rate this feed 1 starRate this feed 2 starRate this feed 3 starRate this feed 4 starRate this feed 5 star

Comments (0)

Feed Details and Statistics Feed Statistics
Preview: O'Reilly Radar - Insight, analysis, and research about emerging technologies

All - O'Reilly Media



All of our Ideas and Learning material from all of our topics.



Updated: 2017-02-21T10:29:13Z

 



How do I use the singleton pattern in C#?

2017-02-21T10:00:00Z

(image)

Learn how to create thread-safe instances with the singleton pattern in C#.

Continue reading How do I use the singleton pattern in C#?.

(image)



How do I use the repository pattern in C#?

2017-02-21T09:00:00Z

(image)

Learn how to correctly implement the repository pattern in C#.

Continue reading How do I use the repository pattern in C#?.

(image)



Four short links: 20 Feb 2017

2017-02-20T12:25:00Z

Car Security, Civ Math, Free Mindstorms, and Chinese AI Research

  1. Used Cars Still Controllable From Previous Owners' Phone -- “The car is really smart, but it’s not smart enough to know who its owner is, so it’s not smart enough to know it’s been resold,” Henderson told CNNTech. “There’s nothing on the dashboard that tells you ‘the following people have access to the car.'”
  2. Mathematics of Civilization V -- a beautiful obsession. (Theoretically beautiful. The page is full of LaTeX-rendered math and graphs, and is less than beautiful)
  3. Samuel Papert's Mindstorms, Free -- classic, available online as a PDF.
  4. China's AI Research (The Atlantic) -- Yet as the research matures in China, Ng says, it is also becoming its own distinct community. After a recent international meeting in Barcelona, he recalls seeing Chinese language write-ups of the talks circulate right away. He never found any in English. The language issue creates a kind of asymmetry: Chinese researchers usually speak English so they have the benefit of access to all the work disseminated in English. The English-speaking community, on the other hand, is much less likely to have access to work within the Chinese AI community.

Continue reading Four short links: 20 Feb 2017.

(image)



How to drive shareholder value with artificial intelligence

2017-02-20T12:00:00Z

(image)

Your company is probably already doing AI and machine learning, but it needs a road map.

Continue reading How to drive shareholder value with artificial intelligence.

(image)



Four short links: 17 February 2017

2017-02-17T12:05:00Z

Robot Governance, Emotional Labour, Predicting Personality, and Music History

  1. Who Should Own the Robots? (Tyler Cowan) -- what is government in a world where everything is done by the robots? [...] Say there are 50 people in the government, and they allocate the federal budget subject to electoral constraints. Even a very small percentage of skim makes them fantastically wealthy, and gives them all sorts of screwy incentives to hold on to power. If they can, they’ will manipulate robot software toward that end. Designing governance for The Robot Future is definitely a Two Beer Problem.
  2. Emotional Labour for GMail -- Automate emotional labor in Gmail messages.
  3. Beyond the Words: Predicting User Personality from Heterogeneous Information -- we propose a Heterogeneous Information Ensemble framework, called HIE, to predict users’ personality traits by integrating heterogeneous information, including self-language usage, avatar, emoticon, and responsive patterns. In our framework, to improve the performance of personality prediction, we have designed different strategies extracting semantic representations to fully leverage heterogeneous information on social media. (via Adrian Colyer)
  4. Theft: A History of Music -- a graphic novel laying out a 2,000-year-long history of music, from Plato to rap. The comic is by James Boyle, Jennifer Jenkins, and the late Keith Aoki. You can buy print, or download for free.

Continue reading Four short links: 17 February 2017.

(image)



Amir Shevat on workplace communication

2017-02-16T12:20:00Z

(image)

The O’Reilly Bots Podcast: Slack’s head of developer relations talks about what bots can bring to Slack channels.

In this episode of the O’Reilly Bots Podcast, Pete Skomoroch and I speak with Amir Shevat, head of developer relations at Slack and the author of the forthcoming O’Reilly book Designing Bots: Creating Conversational Experiences.

Continue reading Amir Shevat on workplace communication.

(image)



Simon Endres on designing in an arms race of high-tech materials

2017-02-16T12:00:00Z

(image)

The O’Reilly Design Podcast: The guiding light of strategy, designing Allbirds, and what makes the magic of a brand identity.

In this week’s Design Podcast, I sit down with Simon Endres, creative director and partner at Red Antler. We talk about working from a single idea, how Red Antler is helping transform product categories, and the importance of having a point of view.

Continue reading Simon Endres on designing in an arms race of high-tech materials.

(image)



Four short links: 16 February 2017

2017-02-16T11:40:00Z

Memory-Busting Javascript, Taobao Villages, Drone Simulation, and Bio Bots

  1. ASLR-Busting Javascript (Ars Technica) -- modern chips randomize where your programs actually live in memory, to make it harder for someone to overwrite your code. This clever hack (in Javascript!) makes the CPU cache reveal (through faster returns) where your code is. I'm in awe.
  2. China's Taobao Villages (Quartz) -- Today, the township and its surrounding area are China’s domestic capital for one rather specific category of products: acting and dance costumes. Half of the township’s 45,000 residents produce or sell costumes—ranging from movie-villain attire to cute versions of snakes, alligators, and monkeys—that are sold on Alibaba-owned Taobao, the nation’s largest e-commerce platform.
  3. Aerial Informatics and Robotics platform (Microsoft) -- open source drone simulator.
  4. How to Build Your Own Bio Bot (Ray Kurzweil) -- researchers are sharing a protocol with engineering details for their current generation of millimeter-scale soft robotic bio-bots.

Continue reading Four short links: 16 February 2017.

(image)



How Python syntax works beneath the surface

2017-02-16T11:00:00Z

(image)

Use Python's magic methods to amplify your code.

Continue reading How Python syntax works beneath the surface.

(image)



Four short links: 15 Feb 2017

2017-02-15T12:50:00Z

Docker Data, Smart Broadcasting, Open Source, and Cellphone Spy Tools

  1. Docker Data Kit -- Connect processes into powerful data pipelines with a simple git-like filesystem interface.
  2. RedQueen: An online algorithm for smart broadcasting in social networks (Adrian Colyer) -- This paper starts out with a simple question “when’s the best time to tweet if you want to get noticed?,” detours through some math involving “solving a novel optimal control problem for a system of jump stochastic differential equations (SDEs),” and pops out again on the other side with a simple online algorithm called RedQueen.
  3. Open Source Guides -- GitHub's guide to making and contributing to open source. GitHub's is nicely packaged into visual and consumable chunks, but I still prefer (newly updated) Producing Open Source Software. The more people know how to do open source, the better.
  4. Cellphone Spy Tools Flood Local Police Departments -- caught my eye because I'm pondering visiting the U.S. this year, and I'm not a fan of surrendering devices for search. My current line of thought is: if CBP/popo are going to take a device from me and plug it into their software, hardware, and network ... it just has to look like a phone. Next challenge: making a large capacitor look like an unlocked iPhone.

Continue reading Four short links: 15 Feb 2017.

(image)



Doug Barth and Evan Gilman on Zero Trust networks

2017-02-15T11:00:00Z

(image)

The O’Reilly Security Podcast: The problem with perimeter security, rethinking trust in a networked world, and automation as an enabler.

In this episode, I talk with Doug Barth, site reliability engineer at Stripe, and Evan Gilman, Doug’s former colleague from PagerDuty who is now working independently on Zero Trust networking. They are also co-authoring a book for O’Reilly on Zero Trust networks. They discuss the problems with traditional perimeter security models, rethinking trust in a networked world, and automation as an enabler.

Continue reading Doug Barth and Evan Gilman on Zero Trust networks.

(image)



Why you need a data strategy, and what happens if you don’t have one

2017-02-14T12:00:00Z

(image)

How to map out a plan for finding value in data.

Continue reading Why you need a data strategy, and what happens if you don’t have one.

(image)



Four short links: 14 Feb 2017

2017-02-14T11:55:00Z

Rapping Neural Network, H1B Research, Quantifying Controversy, Social Media Research Tools

  1. Rapping Neural Network -- It's a neural network that has been trained on rap songs, and can use any lyrics you feed it and write a new song (it now writes word by word as opposed to line by line) that rhymes and has a flow (to an extent). With examples.
  2. H1B Research -- H1B holders are paid less and often weaker in skills compared to their American counterparts.
  3. Amazon Chime -- interesting to see a business service from Amazon, not a operations service. This is better (they claim) meeting software: move between devices, with screen-sharing, video, chat, file-sharing.
  4. Quantifying Controversy in Social Media -- The research is carried out in the context of Twitter, but in theory can be applied to any social graph structure. A topic is simply defined as a query, often a hashtag. Given a query, we can build a conversation graph with vertices representing users, and edges representing activity and interactions between users. Using a graph partitioning algorithm, we can then try to partition the graph in two. If the partitions separate cleanly, then we have a good indication that the topic is controversial and has polarized opinions.
  5. Social Media Research Toolkit -- a list of 50+ social media research tools curated by researchers at the Social Media Lab at Ted Rogers School of Management, Ryerson University. The kit features tools that have been used in peer-reviewed academic studies. Many tools are free to use and require little or no programming. Some are simple data collectors such as tweepy, a Python library for collecting Tweets, and others are a bit more robust, such as Netlytic, a multi-platform (Twitter, Facebook, and Instagram) data collector and analyzer, developed by our lab. All of the tools are confirmed available and operational.

Continue reading Four short links: 14 Feb 2017.

(image)



Four short links: 13 Feb 2017

2017-02-13T12:00:00Z

Urban Attractors, Millimetre-Scale Computing, Ship Small Code, and C++ Big Data

  1. Urban Attractors: Discovering Patterns in Regions of Attraction in Cities -- We use a hierarchical clustering algorithm to classify all places in the city by their features of attraction. We detect three types of Urban Attractors in Riyadh during the morning period: Global, which are significant places in the city, and Downtown, which are the central business district and Residential attractors. In addition, we uncover what makes these places different in terms of attraction patterns. We used a statistical significance testing approach to rigorously quantify the relationship between Points of Interests (POIs) types (services) and the three patterns of Urban Attractors we detected.
  2. Millimetre-Scale Deep Learning -- Another micro mote they presented at the ISSCC incorporates a deep-learning processor that can operate a neural network while using just 288 microwatts.
  3. Ship Small Diffs (Dan McKinley) -- your deploys should be measured in dozens of lines of code rather than hundreds. [...] In online systems, you have to ship code to prove that it works. [...] Your real problem is releasing frequently. So quotable, so good.
  4. Thrill -- distributed big data batch computations on a cluster of machines ... in C++. (via Harris Brakmic)

Continue reading Four short links: 13 Feb 2017.

(image)



The dirty secret of machine learning

2017-02-13T12:00:00Z

(image)

David Beyer talks about AI adoption challenges, who stands to benefit most from the technology, and what's missing from the conversation.

Continue reading The dirty secret of machine learning.

(image)



Four short links: 10 Feb 2017

2017-02-10T12:00:00Z

Microsoft Graph Engine, Data Exploration, Godel Escher Bach, and Docker Secrets

  1. Microsoft Graph Engine -- open source (Windows now, Unix coming) graph data engine. It's the open source implementation of Trinity: A Distributed Graph Engine on a Memory Cloud.
  2. Superset -- AirBnB's data exploration platform designed to be visual, intuitive, and interactive now with a better SQL IDE.
  3. MIT Godel Escher Bach Lectures -- not Hofstadter himself, but a thorough walkthrough of the premises and ideas in the book.
  4. Docker Secrets Management -- interesting to see etcd getting some competition here.

Continue reading Four short links: 10 Feb 2017.

(image)



Hacker quantified security

2017-02-10T11:00:00Z

(image)

Alex Rice on the importance of inviting hackers to find vulnerabilities in your system, and how to measure the results of incorporating their feedback.

Continue reading Hacker quantified security.

(image)



Tom Davenport on mitigating AI's impact on jobs and business

2017-02-09T12:20:00Z

(image)

The O'Reilly Radar Podcast: The value humans bring to AI, guaranteed job programs, and the lack of AI productivity.

This week, I sit down with Tom Davenport. Davenport is a professor of Information Technology and Management at Babson College, the co-founder of the International Institute for Analytics, a fellow at the MIT Center for Digital Business, and a senior advisor for Deloitte Analytics. He also pioneered the concept of “competing on analytics.” We talk about how his ideas have evolved since writing the seminal work on that topic, Competing on Analytics: The New Science of Winning; his new book Only Humans Need Apply: Winners and Losers in the Age of Smart Machines, which looks at how AI is impacting businesses; and we talk more broadly about how AI is impacting society and what we need to do to keep ourselves on a utopian path.

Continue reading Tom Davenport on mitigating AI's impact on jobs and business.

(image)



Deep learning for Apache Spark

2017-02-09T12:00:00Z

(image)

The O’Reilly Data Show Podcast: Jason Dai on BigDL, a library for deep learning on existing data frameworks.

In this episode of the Data Show, I spoke with Jason Dai, CTO of big data technologies at Intel, and co-chair of Strata + Hadoop World Beijing. Dai and his team are prolific and longstanding contributors to the Apache Spark project. Their early contributions to Spark tended to be on the systems side and included Netty-based shuffle, a fair-scheduler, and the “yarn-client” mode. Recently, they have been contributing tools for advanced analytics. In partnership with major cloud providers in China, they’ve written implementations of algorithmic building blocks and machine learning models that let Apache Spark users scale to extremely high-dimensional models and large data sets. They achieve scalability by taking advantage of things like data sparsity and Intel’s MKL software. Along the way, they’ve gained valuable experience and insight into how companies deploy machine learning models in real-world applications.

Continue reading Deep learning for Apache Spark.

(image)



How to build—and grow—a strong design practice

2017-02-09T12:00:00Z

5 questions for Aarron Walter: Shaping products, growing teams, and managing through change.I recently asked Aarron Walter, VP of design education at InVision and author of Designing for Emotion, to discuss what he has learned through his years of building and managing design teams. At the O’Reilly Design Conference, Aaron will be presenting a session, Hard-learned lessons in leading design. Your talk at the upcoming O'Reilly Design Conference is titled Hard-learned lessons in leading design. Tell me what attendees should expect. I had the unique opportunity of watching a company grow from just a handful of people to more than 550 over the course of eight years at MailChimp. When I started we had a few thousand customers, but when I left in February of 2016, there were more than 10 million worldwide. We saw tremendous growth, and I learned so much in my time there. In my talk, I'll be sharing the most salient lessons I learned along the way—how to shape a product, grow a team, how a company changes and how it changes people's careers, and a lot more. What are some of the challenges that come along with building and leading a design team in a strong growth period? As a company grows, the people who run it have to grow, too. There's a steep learning curve. When you're a small team it's easy to make decisions and get things done. But when a company grows, clear processes are needed, more people need to be brought into the planning process, and rapport has to be developed between teams and key individuals. The trick is you never really know what stage the company is in, so there's always uncertainty about whether you're doing the right thing. Everyone has to adapt and change with each new stage, and that can be hard for some people. What are some of the more memorable lessons you learned along the way? Early on as the director of UX, I thought my most important job was designing a great product. That was true but only until we needed to start building teams. Then my most important job was hiring great people. That remained my top priority for years to come, and I see it as my lasting legacy within the company. There are so many smart, talented people at MailChimp. I'm proud to have played a part in hiring and mentoring a number of people who've gone on to lead their own teams. In the early years of the product, we were focused on the future, toward new features and new ideas. But as the product and company matured, we had to master the art of refinement. Feature production is a treadmill: there will always be something else you can build. But if those features are half-baked or unrefined, you can end up with a robust product that is too complicated or too broken to use. Phil Libin said it best, "The best product companies in the world have figured out how to make constant quality improvements part of their essential DNA." You will be speaking about the importance of building a strong design practice. Can you explain what this looks like? A strong design practice has these things going for it: A product vision that makes it clear to everyone how the product fits into the lives of the audience. A rigorous process for understanding the problem through research, customer interaction, and debate. A culture of feedback where designers can continue to grow and the work gets pushed to its potential. Strong relationship with other teams. Design is a continuum, not just a step in the process. You have to work with everyone in the process to produce great products. You're speaki[...]



Four short links: 9 February 2017

2017-02-09T11:50:00Z

In-Memory Malware, Machine Ethics, Open Source Maintainer's Dashboard, and Cards Against Silicon Valley

  1. In-Memory Malware Infesting Banks (Ars Technica) -- According to research Kaspersky Lab plans to publish Wednesday, networks belonging to at least 140 banks and other enterprises have been infected by malware that relies on the same in-memory design [as Stuxnet] to remain nearly invisible. (via Boing Boing)
  2. Technical Challenges in Machine Ethics (Robohub) -- interesting interview with a researcher who is attempting to implement ethics in software. Fascinating to read about the approach and challenges.
  3. Scope -- nifty tool to help busy open source maintainers stay on top of their GitHub-hosted projects...dashboard for critical issues, PRs, etc.
  4. Cards Against Silicon Valley -- spot on tragicomedy.

Continue reading Four short links: 9 February 2017.

(image)



Four short links: 8 February 2017

2017-02-08T11:40:00Z

Becoming a Troll, Magic Paper, HTTPS Interception, and Deep NLP

  1. Anyone Can Become a Troll (PDF) -- A predictive model of trolling behavior shows that mood and discussion context together can explain trolling behavior better than an individual’s history of trolling. These results combine to suggest that ordinary people can, under the right circumstances, behave like trolls. (via Marginal Revolution)
  2. Magic Paper -- printed with light, erased with heat, and reusable up to 80 times. (via Slashdot)
  3. The Security Implication of HTTPS Interception (PDF) -- We find more than an order of magnitude more interception than previously estimated and with dramatic impact on connection security. To understand why security suffers, we investigate popular middleboxes and clientside security software, finding that nearly all reduce connection security and many introduce severe vulnerabilities. Drawing on our measurements, we conclude with a discussion on recent proposals to safely monitor HTTPS and recommendations for the security community.
  4. Deep Natural Language Processing Course -- This repository contains the lecture slides and course description for the Deep Natural Language Processing course offered in Hilary Term 2017 at the University of Oxford.

Continue reading Four short links: 8 February 2017.

(image)



What is the new reduce algorithm in C++17?

2017-02-08T10:00:00Z

(image)

Learn how to allow for parallelization using the reduce algorithm, new in C++17.

Continue reading What is the new reduce algorithm in C++17?.

(image)



Four short links: 7 February 2017

2017-02-07T12:10:00Z

Game Theory, Algorithms and Robotics, High School Not Enough, and RethinkDB Rises

  1. Game Theory in Practice (The Economist) -- various firms around the world offering simulations/models of scenarios like negotiations, auctions, regulation, to figure out strategies and likely courses of action from other players.
  2. Videos from the 12th Workshop on Algorithmic Foundations of Robotics -- there are plenty with titles like "non-Gaussian belief spaces" (possibly a description of modern America) but also keynotes with titles like Replicators, Transformers, and Robot Swarms.
  3. No Jobs for High School Grads (NYT) -- “In our factories, there’s a computer about every 20 or 30 feet,” said Eric Spiegel, who recently retired as president and chief executive of Siemens USA. “People on the plant floor need to be much more skilled than they were in the past. There are no jobs for high school graduates at Siemens today.”
  4. The Liberation of RethinkDB -- The Linux Foundation bought the IP after the startup wound-up, where it's now run as an open source project via the Cloud Native Computing Foundation, all with the support of the founder and community. Happy story for everyone but the investors in RethinkDB. Also worth noting: RethinkDB is in a competitive space ("NoSQL stuff") and stands out so much that real money went to rescuing it from the startup deadpool.

Continue reading Four short links: 7 February 2017.

(image)



Staying out of trouble with big data

2017-02-07T12:00:00Z

(image)

Understanding the FTC’s role in policing analytics.

Continue reading Staying out of trouble with big data.

(image)



How do I use the slice notation in Python?

2017-02-07T10:00:00Z

(image)

Learn how to extract data from a structure correctly and efficiently using Python's slice notation.

How do I use the slice notation in Python?

In this tutorial, we will review the Python slice notation, and you will learn how to effectively use it. Slicing is used to retrieve a subset of values.

The basic slicing technique is to define the starting point, the stopping point, and the step size - also known as stride.

Continue reading How do I use the slice notation in Python?.

(image)



Four short links: 6 February 2017

2017-02-06T12:30:00Z

NPC AI, Deep Learning Math Proofs, Amazon Antitrust, and Code is Law

  1. Building Character AI Through Machine Learning -- NPCs that learn from/imitate humans. (via Greg Borenstein)
  2. Network-Guided Proof Search -- We give experimental evidence that with a hybrid, two-phase approach, deep-learning-based guidance can significantly reduce the average number of proof search steps while increasing the number of theorems proved.
  3. Amazon's Antitrust Paradox -- This Note maps out facets of Amazon’s dominance. Doing so enables us to make sense of its business strategy, illuminates anticompetitive aspects of Amazon’s structure and conduct, and underscores deficiencies in current doctrine. The Note closes by considering two potential regimes for addressing Amazon’s power: restoring traditional antitrust and competition policy principles or applying common carrier obligations and duties. Fascinating overview of the American conception of antitrust.
  4. FBI's RAP-BACK Program -- software encodes "guilty before trial." employers enrolled in federal and state Rap Back programs receive ongoing, real-time notifications and updates about their employees’ run-ins with law enforcement, including arrests at protests and charges that do not end up in convictions.

Continue reading Four short links: 6 February 2017.

(image)



How do you customize packages in a Kickstart installation?

2017-02-06T09:00:00Z

(image)

Learn how to set up your configuration file to indicate the types of packages you want to install by using the “yum” command.

Continue reading How do you customize packages in a Kickstart installation?.

(image)



Four short links: 3 February 2017

2017-02-03T11:40:00Z

Stream Alerting, Probabilistic Cognition, Migrations at Scale, and Interactive Machine Learning

  1. StreamAlert -- a serverless, real-time data analysis framework that empowers you to ingest, analyze, and alert on data from any environment, using data sources and alerting logic you define. Open source from AirBnB.
  2. Probabilistic Models of Cognition -- we explore the probabilistic approach to cognitive science, which models learning and reasoning as inference in complex probabilistic models. In particular, we examine how a broad range of empirical phenomena in cognitive science (including intuitive physics, concept learning, causal reasoning, social cognition, and language understanding) can be modeled using a functional probabilistic programming language called Church.
  3. Online Migrations at Scale -- In this post, we’ll explain how we safely did one large migration of our hundreds of millions of Subscriptions objects. This is a solid process.
  4. Interactive Machine Learning (Greg Borenstein) -- intro to, and overview of, the field of Interactive Machine Learning, elucidating the principles for designing systems that let humans use these learning systems to do things they care about. In Greg's words, Machine learning has the potential to be a powerful tool for human empowerment, touching everything from how we shop to how we diagnose disease to how we communicate. To build these next thousand projects in a way that capitalizes on this potential, we need to learn not just how to teach the machines to learn but how to put the results of that learning into the hands of people.

Continue reading Four short links: 3 February 2017.

(image)



Personalization's big question: Why am I seeing this?

2017-02-03T11:00:00Z

(image)

Sara M. Watson from Digital Asia Hub discusses the state of personalization and how it can become more useful for consumers.

Continue reading Personalization's big question: Why am I seeing this?.

(image)



How do you create a Kickstart file?

2017-02-03T10:00:00Z

(image)

Learn how to create and make changes to a Kickstart configuration file using the anaconda-ks.cfg.

Continue reading How do you create a Kickstart file?.

(image)



How do I use the set_intersection algorithm in C++?

2017-02-03T09:00:00Z

(image)

Learn how to handle array comparisons using the set_intersection algorithm in C++.

Continue reading How do I use the set_intersection algorithm in C++?.

(image)



Mike Vladimer on IoT connectivity

2017-02-02T12:50:00Z

(image)

The O’Reilly Hardware Podcast: Powering connected devices with low-power networks.

In this episode of the O’Reilly Hardware Podcast, Brian Jepson and I speak with Mike Vladimer, co-founder of the Orange IoT Studio at Orange Silicon Valley. Vladimer discusses how Internet of Things devices could benefit from connectivity options other than those provided by well-known technologies (including cellular, WiFi, and Bluetooth), and explains the LoRa wireless protocol, which supports long-range and lower-power applications.

Continue reading Mike Vladimer on IoT connectivity.

(image)



Extend Spark ML for your own model/transformer types

2017-02-02T12:00:00Z

(image)

How to use the wordcount example as a starting point (and you thought you’d escape the wordcount example).

While Spark ML pipelines have a wide variety of algorithms, you may find yourself wanting additional functionality without having to leave the pipeline model. In Spark MLlib, this isn't much of a problem—you can manually implement your algorithm with RDD transformations and keep going from there. For Spark ML pipelines, the same approach can work, but we lose some of the nicely integrated properties of the pipeline, including the ability to automatically run meta-algorithms, such as cross-validation parameter search. In this article, you will learn how to extend the Spark ML pipeline model using the standard wordcount example as a starting point (one can never really escape the intro to big data wordcount example).

Continue reading Extend Spark ML for your own model/transformer types.

(image)



Kat Holmes on Microsoft’s human-led approach to tackling society’s challenges

2017-02-02T12:00:00Z

(image)

The O’Reilly Design Podcast: Building bridges across disciplines, universal vs. inclusive design, and what playground design can teach us about inclusion.

In this week’s Design Podcast, I sit down with Kat Holmes, principal design director, inclusive design at Microsoft. We talk about what she looks for in designers, working on the right problems to solve, and why both inclusive and universal design are important but not the same.

Continue reading Kat Holmes on Microsoft’s human-led approach to tackling society’s challenges.

(image)



Four short links: 2 February 2017

2017-02-02T11:50:00Z

Physical Authentication, Crappy Robots, Immigration Game, and NN Flashcards

  1. Pervasive, Dynamic Authentication of Physical Items (ACM Queue) -- Silicon PUF circuits generate output response bits based on a silicon device's manufacturing variation. This is cute!
  2. Hebocon -- crappy robot competition.
  3. Martian Immigration Nightmare -- a game can make a point.
  4. TraiNNing Cards -- flashcards for neural networks. Hilarious!

Continue reading Four short links: 2 February 2017.

(image)



Should you containerize your Go code?

2017-02-02T11:00:00Z

Containers helps you distribute, deploy, run, and test your Golang projects.I’m a huge fan of Go, and I’m also really interested in containers, and how they make it easier to deploy code, especially at scale. But not all Go programmers use containers. In this article I’ll explore some reasons why you really should consider them for your Go code — and then we’ll look at some cases where containers wouldn’t add any benefit at all. First, let’s just make sure we’re all on the same page about what we mean by “containers.” What is a container? There are probably about as many different definitions of what a container is as there are people using them. For many, the word “container” is synonymous with Docker, although containers have been around a lot longer than either the Docker open-source project or Docker the company. If you're new to containers, Docker is probably your best starting point, with its developer-friendly command line support, but there are other implementations available: Linux Containers - container implementations including LXC and LXD rkt - pod-native container engine from CoreOS runc - running containers per the OCI specification Windows Containers - Windows Server containers and Hyper-V containers Containers are a virtualization technology — they let you isolate an application so that it’s under the impression it’s running in its own physical machine. In that sense a container is similar to a virtual machine, except that it uses the operating system on the host rather than having its own operating system. You start a container from a container image, which bundles up everything that the application needs to run, including all its runtime dependencies. These images make for a convenient distribution package. Containers make it easy to distribute your code Because the dependencies are part of the container image, you’ll get exactly the same versions of all the dependencies when you run the image on your development machine, in test, or in production. No more bugs that turn out to be caused by a library version mismatch between your laptop and the machines in the data center. But one of the joys of Go is that it compiles into a single binary executable. You have to deal with dependencies at build time, but there are no runtime dependencies and no libraries to manage. If you’ve ever worked in, say, Python, JavaScript, Ruby, or Java, you’ll find this aspect of Go to be a breath of fresh air: you can get a single executable file out of the Go compilation process, and it’s literally all you need to move to any machine where you want it to run. You don’t need to worry about making sure the target machine has the right version libraries or execution environment installed alongside your program. Err, so, if you have a single binary, what’s the point of packaging up that binary inside a container? The answer is that there might be other things you want to package up alongside your binary. If you’re building a web site, or if you have configuration files that accompany your program, you[...]



What is a Kickstart installation and why would you use it?

2017-02-02T10:00:00Z

(image)

Learn to use Kickstart to get the same look on multiple Red Hat Enterprise Linux system installations.

Continue reading What is a Kickstart installation and why would you use it?.

(image)



How do you use standard algorithms with object methods in C++?

2017-02-02T09:00:00Z

(image)

Learn how to write shorter, better performing, and easier to read code using standard algorithms with object methods in C++.

Continue reading How do you use standard algorithms with object methods in C++?.

(image)



Four short links: 1 February 2017

2017-02-01T18:25:00Z

Unhappy Developers, Incident Report, Compliance as Code, AI Ethics

  1. Unhappy Developers -- paper authors surveyed 181 developers and built a framework of consequences: Internal Consequences, such as low cognitive performance, mental unease or disorder, low motivation; External Consequences, which might be Process-related (low productivity, delayed code, variation from the process) or Artefact-related (low-quality code, rage rm-ing the codebase). Hoping to set the ground for future research into how developer happiness affects software production.
  2. GitLab Database Incident Report -- YP thinks that perhaps pg_basebackup is being super pedantic about there being an empty data directory, decides to remove the directory. After a second or two he notices he ran it on db1.cluster.gitlab.com, instead of db2.cluster.gitlab.com. YP died for your sins.
  3. Compliance as Code -- instead of relying on checklists and procedures and meetings, the policies and rules are enforced (and tracked) through automated controls, which are wired into configuration management tools and the Continuous Delivery pipeline. Every change ties back to version control and a ticketing system like Jira for traceability and auditability: all changes must be made under a ticket, and the ticket is automatically updated along the pipeline, from the initial request for work all the way to deployment.
  4. Ethical Considerations in AI Courses -- In this article, we provide practical case studies and links to resources for use by AI educators. We also provide concrete suggestions on how to integrate AI ethics into a general artificial intelligence course and how to teach a stand-alone artificial intelligence ethics course.

Continue reading Four short links: 1 February 2017.

(image)



Susan Sons on maintaining and securing the internet’s infrastructure

2017-02-01T12:15:00Z

(image)

The O’Reilly Security Podcast: Saving the Network Time Protocol, recruiting and building future open source maintainers, and how speed and security aren’t at odds with each other.

In this episode, O’Reilly’s Mac Slocum talks with Susan Sons, senior systems analyst for the Center for Applied Cybersecurity Research (CACR) at Indiana University. They discuss how she initially got involved with fixing the open source Network Time Protocol (NTP) project, recruiting and training new people to help maintain open source projects like NTP, and how security needn’t be an impediment to organizations moving quickly.

Continue reading Susan Sons on maintaining and securing the internet’s infrastructure.

(image)



Build a super fast deep learning machine for under $1,000

2017-02-01T12:00:00Z

The adventures in deep learning and cheap hardware continue!Yes, you can run TensorFlow on a $39 Raspberry Pi, and yes, you can run TensorFlow on a GPU powered EC2 node for about $1 per hour. And yes, those options probably make more practical sense than building your own computer. But if you’re like me, you’re dying to build your own fast deep learning machine. OK, a thousand bucks is way too much to spend on a DIY project, but once you have your machine set up, you can build hundreds of deep learning applications, from augmented robot brains to art projects (or at least, that’s how I justify it to myself). At the very least, this setup will easily outperform a $2,800 Macbook Pro on every metric other than power consumption and, because it’s easily upgraded, stay ahead of it for a few years to come. I hadn’t built a computer since the ’80s, and I was pretty intimidated by dropping hundreds of dollars on something I might not be able to build (and might not really use), but I’m here to tell you it can be done! Also, it’s really fun, and you will end up with a great general-purpose computer that will generally do inference and learning 20 times faster than your laptop. Here’s what you need to buy and some specific recommendations: Motherboard Motherboards come in different sizes. Since I didn’t want to use multiple GPUs, the cheapest and smallest standard size is called mini-ITX, which will be fine for this sort of project. My minimum requirements were a PCIe slot to plug the GPU into and two DDR4 slots to plug RAM into, and the board I went with was an ASUS Mini ITX DDR4 LGA 1151 B150I PRO GAMING/WIFI/AURA Motherboard for $125 on Amazon. It comes with a WiFi antenna, which is actually super useful in my basement. Case Cases don’t matter much, but they’re pretty cheap, and since this market for DIY computers is dominated by gamers, they come in all kinds of fun shapes and colors. The size should match the motherboard, so it needs to have mini-ITX in the name. I bought a Thermaltake Core V1 Mini ITX Cube on Amazon for $50. RAM I can’t believe how cheap RAM has gotten! You need to buy DDR4 RAM to match the motherboard (that’s most of what you will find online) and the prices are all about the same. I bought two 8GB of Corsair Vengeance for $129. I spent the extra $5 because of the Amazon review that stated, “For those who just cannot get enough LEDs crammed into their system, these are the perfect choice.” If you build a computer in your basement and you don’t embrace your inner Burning Man/teenager aesthetic, you are going to have a really hard time finding components. CPU I looked at speed comparison CPU tests online, and I think I would have been fine with a slower CPU, as very few things I do are CPU-limited (except training neural nets and I’m going to use the GPU for that). But I couldn’t bring myself to build a whole computer with a CPU three gen[...]



What’s a CDO to do?

2017-01-31T12:00:00Z

(image)

Data governance is straightforward; data strategy is not.

Continue reading What’s a CDO to do?.

(image)



Four short links: 31 January 2017

2017-01-31T11:50:00Z

Historic Language, Activist Security, Microcode Assembler, and PDP-10 ITS Source

  1. Computer Language We Get From the Mark I -- loop, patch, library, bug...all illustrated.
  2. Twitter Activist Security (The Grugq) -- This guide hopes to help reduce the personal risks to individuals while empowering their ability to act safely.
  3. mcasm -- microcode assembler.
  4. PDP-10 ITS -- This repository contains source code, tools, and scripts to build an ITS system from scratch. ITS is the Incompatible Timesharing System. Trivia: it's the OS that the original EMACS was written for, and the original Jargon File was written on.

Continue reading Four short links: 31 January 2017.

(image)



Playing with processes in Elixir

2017-01-31T11:00:00Z

(image)

Elixir’s key organizational concept, the process, is an independent component built from functions that sends and receives messages.

   Elixir is a functional language, but Elixir programs are rarely structured around simple functions. Instead, Elixir’s key organizational concept is the process, an independent component (built from functions) that sends and receives messages. Programs are deployed as sets of processes that communicate with each other. This approach makes it much easier to distribute work across multiple processors or computers, and also makes it possible to do things like upgrade programs in place without shutting down the whole system.

Taking advantage of those features, though, means learning how to create (and end) processes, how to send messages among them, and how to apply the power of pattern matching to incoming messages.

Continue reading Playing with processes in Elixir.

(image)



Prototyping and deploying IoT in the enterprise

2017-01-31T11:00:00Z

(image)

Toward a virtuous cycle between people, devices, and cloud.

You have a lot of options available when you’re building a smart, connected device. For example, in recent years, your hardware options have multiplied massively. Even the humble Raspberry Pi, originally designed as an educational tool for youth, is getting into the game with NEC’s announcement of Raspberry Pi Compute Module support in their commercial/industrial display panels.

And there has long been plenty of choices for those who want to roll their own devices from scratch. Every embedded hardware platform has some kind of evaluation board available that works as a starting point for your own designs. For example, you can prototype with an inexpensive reference module like MediaTek’s LinkIt ONE, and then design your own module that has only the parts you need.

Continue reading Prototyping and deploying IoT in the enterprise.

(image)



Four short links: 30 January 2017

2017-01-30T14:10:00Z

Liquid Lenses, SRE Book, MEGA Source, and Founder Game

  1. Liquid Lens Glasses -- eyeglass lenses made of glycerin, a thick colorless liquid, enclosed by flexible rubber-like membranes in the front and back. The rear membrane in each lens is connected to a series of three mechanical actuators that push the membrane back and forth like a transparent piston, changing the curvature of the liquid lens and therefore the focal length between the lens and the eye.
  2. Site Reliability Engineering -- Google SRE book CC-licensed, free for download or purchase from O'Reilly in convenient dead-tree form.
  3. MEGA Source Code -- on GitHub from Mega itself.
  4. The Founder -- a dystopian business simulator.

Continue reading Four short links: 30 January 2017.

(image)



A new visualization to beautifully explore correlations

2017-01-30T12:00:00Z

Introducing the solar correlation map, and how to easily create your own.An ancient curse haunts data analysis. The more variables we use to improve our model, the exponentially more data we need. By focusing on the variables that matter, however, we can avoid underfitting, and the need to collect a huge pile of data points. One way of narrowing input variables is to identify their influence on the output variable. Here correlation helps—if the correlation is strong, then a significant change in the input variable results in an equally strong change in the output variable. Rather than using all available variables, we want to pick input variables strongly correlated to the output variable for our model. There's a catch though—and it arises when the input variables have a strong correlation among themselves. As an example, suppose we want to predict parental education, and we find a strong correlation with country club membership, the number household cars, and costs of vacations in our data set. All of these luxuries grow from the same root: the family is rich. The true underlying correlation is that highly educated parents usually have a higher income. We can either use the household income to predict parental education, or use the array of variables above. We call this type of correlation “intercorrelation.” Intercorrelation is the correlation between explanatory variables. Adding many variables, where one suffices, conjures up the curse of dimensionality, and requires large amounts of data. It is sometimes beneficial therefore, to elect just one representative for a group of intercorrelated input variables. In this article, we’ll explore both correlation and intercorrelation with a “solar correlation map”—a new type of visualization created for this purpose, and we’ll show you how to simply create a solar correlation for yourself. Using the solar correlation map on housing price data We can use covariance and coefficient matrices to apply the solar correlation map to housing price data. As efficient as these tools are, however, they are hard to read. Thankfully, there are visualizations that can beautifully and succinctly represent the matrices to explore the correlations. The solar correlation map is designed for a dual purpose—it addresses: the visual representation of the correlation of each input variable, to the output variable the intercorrelation of the input variables Let's generate the solar correlation map for a standard data set and explore it. Carnegie Mellon University has collected data on Boston Housing prices in the 1990s; it is one of the freely accessible data sets from the UCI (University of California Irvine) Machine Learning repository. Our goal in this data set is to predict the output vari[...]



Four short links: 27 January 2017

2017-01-27T11:55:00Z

Ethics of AI, Vertically Integrated Internet, Assessing Empirical Observations, Battery Teardown

  1. Virginia Dignum: Ethics of AI -- see these notes for the type of material she covers.
  2. Google's Vertically Integrated Internet -- a Hacker News comment worth reading.
  3. A Guide to Assessing Empirical Evaluations (Adrian Colyer) -- here are five questions to help you avoid [traps]: Has all of the data from the evaluation been considered, and not just the data that supports the claim? Have any assumptions been made in forming the claim that are not justified by the evaluation? If the experimental evaluation is compared to prior work, is it an apples-to-oranges comparison? Has everything essential for getting the results been described? Has the claim been unambiguously and clearly stated?
  4. Inside the Tesla 100kWh Battery Pack -- 516 cells per module. That's 8,256 cells per pack, a ~16% increase vs the 85/90 packs. [...] As for real capacity, the BMS reports usable capacity at a whopping 98.4 kWh. It also reports a 4 kWh unusable bottom charge, so that's 102.4 kWh total pack capacity!

Continue reading Four short links: 27 January 2017.

(image)



Pitfalls of HTTP/2

2017-01-27T11:00:00Z

(image)

HTTP/2 is still new and, although deploying it is relatively easy, there are a few things to be on the lookout for when enabling it.

HTTP/1.x (h1) was standardized in 1999, we've had years of experience deploying it, we understand how browsers and servers behave with it, and we've learned how to optimize for it too. In contrast, it has been just 18 months since HTTP/2 (h2) was standardized, and there’s already widespread support of it in browsers, servers and CDNs.

So what makes h2 different from h1, and what should you watch out for when enabling h2 support for a site? Here are five things to look out for along the way.

Continue reading Pitfalls of HTTP/2.

(image)



Compliance as code

2017-01-27T11:00:00Z

Build regulatory compliance into development and operations, and write compliance and checks and auditing into continuous delivery, so it becomes an integral part of how your DevOps team works. DevOps can be followed to achieve what Justin Arbuckle at Chef calls “Compliance as Code”: building compliance into development and operations, and wiring compliance policies and checks and auditing into Continuous Delivery so that regulatory compliance becomes an integral part of how DevOps teams work on a day-to-day basis. Chef Compliance Chef Compliance is a tool from Chef that scans infrastructure and reports on compliance issues, security risks, and outdated software. It provides a centrally managed way to continuously and automatically check and enforce security and compliance policies. Compliance profiles are defined in code to validate that systems are configured correctly, using InSpec, an open source testing framework for specifying compliance, security, and policy requirements. You can use InSpec to write high-level, documented tests/assertions to check things such as password complexity rules, database configuration, whether packages are installed, and so on. Chef Compliance comes with a set of predefined profiles for Linux and Windows environments as well as common packages like Apache, MySQL, and Postgres. When variances are detected, they are reported to a central dashboard and can be automatically remediated using Chef. A way to achieve Compliance as Code is described in the “DevOps Audit Defense Toolkit”, a free, community-built process framework written by James DeLuccia, IV, Jeff Gallimore, Gene Kim, and Byron Miller.1 The Toolkit builds on real-life examples of how DevOps is being followed successfully in regulated environments, on the Security as Code practices that we’ve just looked at, and on disciplined Continuous Delivery. It’s written in case-study format, describing compliance at a fictional organization, laying out common operational risks and control strategies, and showing how to automate the required controls. Defining Policies Upfront Compliance as Code brings management, compliance, internal audit, the PMO and infosec to the table, together with development and operations. Compliance policies and rules and control workflows need to be defined upfront by all of these stakeholders working together. Management needs to understand how operational risks and other risks will be controlled and managed through the pipeline. Any changes to these policies or rules or workflows need to be formally approved and documented; for example, in a Change Advisory Board (CAB) meeting. But instead of relying on checklists and procedures and meetings, the policies [...]



The state of Jupyter

2017-01-26T17:05:00Z

How Project Jupyter got here and where we are headed.In this post, we’ll look at Project Jupyter and answer three questions: Why does the project exist? That is, what are our motivations, goals, and vision? How did we get here? Where are things headed next, in terms of both Jupyter itself and the context of data and computation it exists in? Project Jupyter aims to create an ecosystem of open source tools for interactive computation and data analysis, where the direct participation of humans in the computational loop—executing code to understand a problem and iteratively refine their approach—is the primary consideration. Anchoring Jupyter around humans is key to the project; it helps us both narrow our scope in some directions (e.g., we are not building generic frameworks for graphical user interfaces) and generalize in others (e.g., our tools are language agnostic despite our team’s strong Python heritage). In service of this goal, we: Explore ideas and develop open standards that try to capture the essence of what humans do when using the computer as a companion to reasoning about data, models, or algorithms. This is what the Jupyter messaging protocol or the Notebook format provide for their respective problems, for example. Build libraries that support the development of an ecosystem, where tools interoperate cleanly without everyone having to reinvent the most basic building blocks. Examples of this include tools for creating new Jupyter kernels (the components that execute the user’s code) or converting Jupyter notebooks to a variety of formats. Develop end-user applications that apply these ideas to common workflows that recur in research, education, and industry. This includes tools ranging from the now-venerable IPython command-line shell (which continues to evolve and improve) and our widely used Jupyter Notebook to new tools like JupyterHub for organizations and our next-generation JupyterLab modular and extensible interface. We strive to build highly usable, very high-quality applications, but we focus on specific usage patterns: for example, the architecture of JupyterLab is optimized for a web-first approach, while other projects in our ecosystem target desktop usage, like the open source nteract client or the support for Jupyter Notebooks in the commercial PyCharm IDE. Host a few services that facilitate the adoption and usage of Jupyter tools. Examples include NBViewer, our online notebook sharing system, or the free demonstration service try.jupyter.org. These services are themselves fully open source, enabling others to either deploy them in custom environments or build new te[...]



Genevieve Bell on moving from human-computer interactions to human-computer relationships

2017-01-26T13:15:00Z

(image)

The O’Reilly Radar Podcast: AI on the hype curve, imagining nurturing technology, and gaps in the AI conversation.

This week, I sit down with anthropologist, futurist, Intel Fellow, and director of interaction and experience research at Intel, Genevieve Bell. We talk about what she’s learning from current AI research, why the resurgence of AI is different this time, and five things that are missing from the AI conversation.

Continue reading Genevieve Bell on moving from human-computer interactions to human-computer relationships.

(image)



The key to building deep learning solutions for large enterprises

2017-01-26T13:12:00Z

(image)

The O’Reilly Data Show Podcast: Adam Gibson on the importance of ROI, integration, and the JVM.

As data scientists add deep learning to their arsenals, they need tools that integrate with existing platforms and frameworks. This is particularly important for those who work in large enterprises. In this episode of the Data Show, I spoke with Adam Gibson, co-founder and CTO of Skymind, and co-creator of Deeplearning4J (DL4J). Gibson has spent the last few years developing the DL4J library and community, while simultaneously building deep learning solutions and products for large enterprises.

Continue reading The key to building deep learning solutions for large enterprises.

(image)



Chris Messina on conversational commerce

2017-01-26T13:10:00Z

(image)

The O’Reilly Bots Podcast: The 2017 bot outlook with one of the field’s early adopters.

In this episode of the O’Reilly Bots Podcast, Pete Skomoroch and I speak with Chris Messina, bot evangelist, creator of the hashtag, and, until recently, developer experience lead at Uber. We talk about the origins of MessinaBot, ruminate on the need for bots that truly exploit their medium rather than imitating older apps, and take a look at what’s ahead for bots in 2017.

Continue reading Chris Messina on conversational commerce.

(image)



AI building blocks: The eggs, the chicken, and the bacon

2017-01-26T12:05:00Z

Data, algorithms, and better business results are key to developing AI.As I read this post from the World Economic Forum, This is why China has the edge in Artificial Intelligence, what struck me wasn't whether China has an edge in AI, or even if I care. What struck me is the proposed five building blocks required for AI development: Massive data Automatic data tagging systems Top scientists Defined industry requirements Highly efficient computing power It made me wonder, are these factors essential to building a solid foundation for AI? Does high performance in these areas give an edge to AI projects? And, overall, my answer was: somewhat, but misleading. Let me explain, by block: Massive data. IMHO, this is the red herring of AI. Too many believe "s/he who has the most data wins." Data is absolutely valuable, but volume alone does not bring value. Within volume, you can have data that is generic or redundant. Therefore, massive amounts of data only help you if it can be used for differentiation. Specifically, you’re able to drive better results from that data. And, three other V's define big data: variety, velocity, and veracity. Variety and velocity do not require "massive-ness." As for veracity, you know the value of massive amounts of garbage data. Finally, I'd add that massive data can quickly lead to tyranny of popularity (i.e., those instances with the most data win). We all have examples of when one nugget of information was the key; sometimes the small data should win. Bottom line: big data is a building block—check; massive data—misleading. Automatic data tagging systems. The automated tagging systems are AI, so we get caught in an infinite loop if we take this as a building block. Bottom line: automatic data tagging systems are sub-assemblies, not building blocks. Top scientists. First, none of this is possible without research. None. HT to Bengio(s), LeCun, Ng, Hinton, et al. And, the WEF article calls out a combination of scientists and engineers, but with more of a waterfall approach versus one based on requirements. The question must be what you are trying to build and how important it is for you to create the algorithms versus use algorithms conceived or created by others. You need to decide this for your business—where is science important and where is implementation important? The two are different blocks, and both are critical. And, you might have different answers to different parts of your problem. Bottom line: top scientists and/or experienced engineers create the building bl[...]



Guaranteed successful design

2017-01-26T12:00:00Z

5 questions for Noah Iliinsky: Solving real problems, measuring success, and adopting holistic thinking.I recently asked Noah Iliinsky, senior UX architect at Amazon Web Services, and co-editor of Beautiful Visualization and co-author of Designing Data Visualizations, to discuss the principles for successful design, common missteps designers make, and why holistic thinking is an important skill for all designers. At the O’Reilly Design Conference, Noah will be presenting a session, Guaranteed successful design. You're presenting a talk at the O'Reilly Design Conference called Guaranteed successful design. Tell me more about what folks can expect. It's a survey of design techniques, approaches, and tenets that are either not-well-known-enough (Wardley mapping, design for human inaction), or are understood but not sufficiently practiced (draw the map or diagram). This talk originated as a lightning talk, where each topic was mostly just a headline and a single line of description. I'll be walking through them in the same order as before, but giving more depth and background for each technique. You are covering 17 principles that can improve odds of success. How do you measure success? Great question. Success can me measured by subjective user experience (frustrating, easy, delightful, confusing, etc.) as well as by metrics around task completion rate, number of errors, etc. There's also the greater question of solving the right problem in the first place. Even if your design is perfect, it can't be a success if it isn't solving a real problem. Each of these topics is designed to guide the right sort of inquiry to increase the likelihood of solving the right problem in a satisfying manner. Conversely, what are some of the major missteps designers make when approaching their work? The are two major classes of design process error I see frequently. The first is people providing solutions for problems that don't actually exist, or only exist for a small subset of people (who are often similar types of people to the solution-makers). The second class of error is problem solvers falling in love with a particular implementation of a solution, rather than understanding that each implementation is one of many that can satisfy a particular requirement, and each have different strengths and weaknesses. Not coincidentally, these topics are both heavily addressed in my talk. Why do you think it's so difficult for designers to think more holistically about the [...]



A guide to improving data integrity

2017-01-26T12:00:00Z

Validating your data requires asking the right questions and using the right data.Almost 10 years ago, I started as an intern on a data engineering team, working my way up to senior developer after working on dozens of projects and processes, including simple queries of data, data warehousing, parsing raw logs, translations, aggregations, and creating products for final reports and analysis. After working with data for many years to create reliable reports and analysis, I've seen many people join our team who are processing and analyzing data for the first time (and sometimes even after years of working with only pristine, reportable data), and who struggle at first to understand how to reliably maintain and use data that is more raw and random. Often, they don’t know to ensure only the right data is used, and answering the question, "How would you test that?” has been difficult to even begin to answer. However, after working in data sets, this becomes more obvious and easy. To help answer this question, it's helpful to focus on boundaries and hard expectations within the data, specifically on format and validity of the values being observed. Ask yourself these questions: Do I have all the data I started with? Are there nulls in the data that should have values? Are there duplicates in the data? Other key things to look at when evaluating data’s validity include trends and how different components of data relate. For example, if you’re testing a set of data that represents a shopping experience with users, products, purchases, and carts, some key questions to answer may include: Do all purchases relate to valid products? Does every cart and purchase have a valid user? Is the total number of carts less than total users? (Assuming each user should only have at most one cart.) Recently, I worked on a large project that encompassed almost all of the data processes and testing practices that we’ve developed over the past decade or so. Therefore, I decided to use this project as a case study to create a guide for how I test data and think about the process. This case study also included the larger question: "Can I trust the data I'm using?", which goes beyond verifying the accuracy of data transformations and processes to ensure the right data sets are used for analysis. This project started with processing and parsing raw data from log files and finished with optimized data tables for reporting in business intelligence tools. These[...]



Four short links: 26 January 2017

2017-01-26T11:30:00Z

Soda Locker, Building Fabricator, Familied Traveler Advice, and Technically Competent Bosses

  1. The Soda Locker Vending Machine (Instructables) -- genius creation from a high schooler!
  2. Robotic Fabricator for Buildings (MIT TR) -- The In Situ Fabricator1 is designed from the bottom up to be practical. It can build stuff using a range of tools with a precision of less than five millimeters, it is designed to operate semi-autonomously in a complex changing environment, it can reach the height of a standard wall, and it can fit through ordinary doorways. And it is dust- and waterproof, runs off standard electricity, and has battery backup. On top of all this, it must be internet connected so that an architect can make real-time changes to any plans if necessary.
  3. Lessons Learned From a Million Miles and 5 Kids (Bryce Roberts) -- golden advice for travelers with families at home.
  4. You're More Likely to be Happy at Work if Your Boss is Technically Competent (HBR) -- technical competence is Whether the supervisor could, if necessary, do the employee’s job; whether the supervisor worked his or her way up inside the company; the supervisor’s level of technical competence as assessed by a worker. 35,000 randomly sampled employees at different workplaces. When we look closely at the data, a striking pattern emerges. The benefit of having a highly competent boss is easily the largest positive influence on a typical worker’s level of job satisfaction. Even we were surprised by the size of the measured effect. For instance, among American workers, having a technically competent boss is considerably more important for employee job satisfaction than their salary (even when pay is really high).

Continue reading Four short links: 26 January 2017.

(image)



Inside the Washington Post’s popularity prediction experiment

2017-01-25T13:00:00Z

A peek into the clickstream analysis and production pipeline for processing tens of millions of daily clicks, for thousands of articles.In the distributed age, news organizations are likely to see their stories shared more widely, potentially reaching thousands of readers in a short amount of time. At the Washington Post, we asked ourselves if it was possible to predict which stories will become popular. For the Post newsroom, this would be an invaluable tool, allowing editors to more efficiently allocate resources to support a better reading experience and richer story package, adding photos, videos, links to related content, and more, in order to more deeply engage the new and occasional readers clicking through to a popular story. Here’s a behind-the-scenes look at how we approached article popularity prediction. Data science application: Article popularity prediction There has not been much formal work in article popularity prediction in the news domain, which made this an open challenge. For our first approach to this task, Washington Post data scientists identified the most-viewed articles on five randomly selected dates, and then monitored the number of clicks they received within 30 minutes after being published. These clicks were used to predict how popular these articles would be in 24 hours. Using the clicks 30 minutes after publishing yielded poor results. As an example, here are five very popular articles: Figure 1. Credit: Shuguang Wang and Eui-Hong (Sam) Han, used with permission. Figure 2. Credit: Shuguang Wang and Eui-Hong (Sam) Han, used with permission. Figure 3. Credit: Shuguang Wang and Eui-Hong (Sam) Han, used with permission. Figure 4. Credit: Shuguang Wang and Eui-Hong (Sam) Han, used with permission. Figure 5. Credit: Shuguang Wang and Eui-Hong (Sam) Han, used with permission. Table 1 lists the actual number of clicks these five articles received 30 minutes and 24 hours after being published. The takeaway: looking at how many clicks a story gets in the first 30 minutes is not an accurate way to measure its potential for popularity: Table 1. Five popular articles. Articles # clicks @ 30mins # clicks @ 24hours 9/11 Flag 6,245 67,028 Trump Policy 2,015 128,217 North Carolina 1,952 [...]