Subscribe: O'Reilly Radar - Insight, analysis, and research about emerging technologies
http://radar.oreilly.com/atom.xml
Added By: Feedage Forager Feedage Grade B rated
Language: English
Tags:
continue reading  continue  data  jupyter  learning  links june  machine learning  new  product  reading  short links  time 
Rate this Feed
Rate this feedRate this feedRate this feedRate this feedRate this feed
Rate this feed 1 starRate this feed 2 starRate this feed 3 starRate this feed 4 starRate this feed 5 star

Comments (0)

Feed Details and Statistics Feed Statistics
Preview: O'Reilly Radar - Insight, analysis, and research about emerging technologies

All - O'Reilly Media



All of our Ideas and Learning material from all of our topics.



Updated: 2017-06-23T01:32:52Z

 



Community and development

2017-06-22T21:00:00Z

(image)

Aria Stewart discusses the role community plays in the web world and how we can take care of what we’ve built.

Continue reading Community and development.

(image)



The evolving role of product management

2017-06-22T17:14:00Z

What product management is and why it’s so relevant today.As Marty Cagan, founding partner of Silicon Valley Product Group and a 30-year veteran of product management, puts it, “The job of a product manager is to discover a product that is valuable, usable, and feasible.” Similarly, co-author Martin Eriksson, in his oft-quoted definition of product management, calls it the intersection between business, user experience, and technology (see Figure 1; only a product manager would define themselves in a Venn diagram!). A good product manager must be experienced in at least one, passionate[1] about all three, and conversant with practitioners of all three. Figure 1. Product management has been called the intersection between business, technology, and user experience (source: Martin Eriksson, 2011). Business Product management is above all else a business function, focused on maximizing business value from a product. Product managers should be primarily focused on optimizing a product to achieve business goals while maximizing return on investment. User experience (UX) Perhaps most importantly, the product manager is the voice of the customer inside the business, and thus must be passionate about customers and the specific problems they’re trying to solve. This doesn’t mean the product manager should become a full-time researcher or a full-time designer, but they do need to make time for this important work. Getting out to talk to customers, testing the product, and getting feedback firsthand, as well as working closely with internal and external UX designers and researchers, are all part of this process. Technology There’s no point in defining what to build if you don’t know how it will get built. This doesn’t mean a product manager needs to be able to code, but understanding the technology stack—and most importantly, the level of effort involved—is crucial to making the right decisions. This is key in an Agile world where product managers spend more time with the development team than with anyone else inside the business, and need a shared language and understanding with their engineers. Other functions, like team development, marketing, and strategic planning, play a part too, but business, UX, and technology form the core of what product managers do every day. The product management role Earlier we said that good managers are not necessarily good leaders. Good leaders, however, must be good managers, so in this section we will discuss the role of product manager and how it overlaps the role of product leader. Why does a product manager need skills in areas like business, UX, and technology? Primarily because the role itself is incredibly broad and varied. Nilan Peiris, VP of product and growth at TransferWise, says that product managers need to “do whatever needs to be done.” Tanya Cordrey, former chief digital officer at the Guardian, adds, “One of the really fantastic things about product management, but also one of the real stresses of it, is that it is a very broad role. You have to be able to be really good at strategy, be inspirational, and understand the long-term picture. At the same time, you have to be really good at the operational side and making things happen.” This starts with setting a vision for the product. The product manager should research their market, their customer, and the problem the customer is trying to solve. They have to assimilate huge amounts of information—including qualitative feedback from customers, quantitative data from analytics tools and statistics, research reports, and market trends, to name but a few. They need to know everything that can be known and then mix all that information with a healthy dose of creativity to define a vision for their product. Once the vision is in place, the product manager must spread the word throughout their business. They have to believe in the product and get almost evangelical about the utopia that the product vision represents. If they can’t get e[...]



A scalable time-series database that supports SQL

2017-06-22T12:50:00Z

(image)

The O’Reilly Data Show Podcast: Michael Freedman on TimescaleDB and scaling SQL for time-series.

In this episode of the Data Show, I spoke with Michael Freedman, CTO of Timescale and professor of computer science at Princeton University. When I first heard that Freedman and his collaborators were building a time-series database, my immediate reaction was: “Don’t we have enough options already?” The early incarnation of Timescale was a startup focused on IoT, and it was while building tools for the IoT problem space that Freedman and the rest of the Timescale team came to realize that the database they needed wasn’t available (at least out in open source). Specifically, they wanted a database that could easily support complex queries and the sort of real-time applications many have come to associate with streaming platforms. Based on early reactions to TimescaleDB, many users concur.

Continue reading A scalable time-series database that supports SQL.

(image)



Probabilistic programming from scratch

2017-06-22T11:00:00Z

Working with uncertainty in real-world data. Real-world data is almost always incomplete or innaccurate in some way. This means that the uncertain conclusions we draw from it are only meaningful if we can answer the question: how uncertain? One way to do this is using Bayesian inference. But, while Bayesian inference is conceptually simple, it can be analytically and computationally difficult in practice. Probabilistic programming is a paradigm that abstracts away some of this complexity. There are many probabilistic programming systems. Perhaps the most advanced is Stan, and the most accessible to non-statistician programmers is PyMC3. At Fast Forward Labs, we recently shared with our clients a detailed report on the technology and uses of probabilistic programming in startups and enterprises. But in this article, rather than use either of these advanced comprehensive systems, we're going to build our own extremely simple system from from scratch. We'll write clear, functional Python 3. We'll use generators to build up a pipeline that will allow us to answer concrete questions. We won't use any libraries (except for random number generation and plotting). And I'll go easy on the mathematics. The code will be slow compared to Stan and PyMC3, but hopefully you'll understand every line. This "from scratch" approach follows in the footsteps of Joel Grus's book Data Science from Scratch, and Jake VanderPlas's PyCon talk Statistics for Hackers. I recommend both. In his talk, Jake said, "if you can write a for loop, you can do statistical analysis." That isn't always true (good luck implementing ADVI with a for loop). But when it is true, we can focus on the big, fundamental ideas without getting lost in algebraic or computational details. An A/B test Let's take a specific data analysis problem: a simple A/B test for a website. Suppose our site has two layouts. During our test, 4% of visitors to layout A convert (i.e., buy something, sign up for the mailing list, whatever), and 5% to layout B convert. Clearly, layout B is better, so we should use that layout, right? But what if I tell you it was a very small test? n_visitors_a = 100 # number of visitors shown layout A n_conv_a = 4 # number of vistors shown layout A who converted n_visitors_b = 40 n_conv_b = 2 Are you still sure B is better? And what if it's going to cost us $1 million to change the layout if we get this decsion wrong. Are you sure enough? If not, how much more data would you need? To answer these questions, we need to quantify exactly how confident we are that layout B is better, given the slice of data we do have. A simple algorithm for Bayesian inference We can do that using Bayesian inference. Bayesian inference is a method for updating your knowledge about the world with the information you learn during an experiment. It derives from a simple equation called Bayes's Rule. In its most advanced and efficient forms, it can be used to solve huge problems. But we're going use a specific, simple inference algorithm called Approximate Bayesian Computation (ABC), which is barely a couple of lines of Python: def posterior_sampler(data, prior_sampler, simulate): '''Yield samples from the posterior by Approximate Bayesian Computation.''' for p in prior_sampler: if simulate(p) == data: yield p This function turns the prior distribution into the posterior. What does that mean? I talk about these distributions in more detail in the Orioles, but for this article, the rough idea is sufficient: samples from the prior distribution are our best guesses of the values of the unknown parameter of our system. In the case of an A/B test, this is the conversion fraction of a layout. These guesses are made before we do the experiment. Samples from the posterior distribution, meanwhile, are guesses of the same parameters made after the experiment, in the light of the data we gathered. Once you have the posterior, y[...]



Four short links: 22 June 2017

2017-06-22T10:30:00Z

Video Segmentation, Password Resets, Hackpocalypse Now, and Google Glass Updates

  1. Machine Learning and Coresets for Automated Real-Time Video Segmentation of Laparoscopic and Robot-Assisted Surgery -- they automatically split video into segments and identify representative frames for each segment, using coresets. (Google helpfully corrected my "mit coresets" search to "with corsets," but I'll let you find your own interesting links there.)
  2. The Password Reset MitM Attack -- has a great checklist at the end, which will help you get your password reset process right. (via Adrian Colyer)
  3. Extent of Ukrainian Hacks (Wired) -- A hacker army has systematically undermined practically every sector of Ukraine: media, finance, transportation, military, politics, energy. Wave after wave of intrusions have deleted data, destroyed computers, and in some cases paralyzed organizations’ most basic functions. “You can’t really find a space in Ukraine where there hasn’t been an attack,” says Kenneth Geers, a NATO ambassador who focuses on cybersecurity. In a public statement in December, Ukraine’s president, Petro Poroshenko, reported that there had been 6,500 cyberattacks on 36 Ukrainian targets in just the previous two months. You know how most intrusions in your country aren't reported? Imagine if those intrusions were used to shut down the organization in question: that's Ukraine.
  4. Mysterious Google Glass Updates -- "XE23" is the new firmware version, the first such update in nearly three years. In addition to the usual "bug fixes and performance improvements," Glass can now make use of paired Bluetooth input devices, like keyboards and mice. Android Police actually dusted off a unit and got the new firmware up and running, discovering that you'll actually get a mouse cursor on the unit if you pair a mouse.

Continue reading Four short links: 22 June 2017.

(image)



Will HTTP/2 make my site faster?

2017-06-22T10:00:00Z

(image)

How latency, packet loss, content type, and third-party content affect performance.

“Will HTTP/2 make my site faster?” is a question often asked by companies focused on fast and reliable websites.

The general answer is “most likely, yes.” However, the mileage varies considerably from site to site because there are a few things that affect how much of a performance boost HTTP/2 will provide. Those are:

Continue reading Will HTTP/2 make my site faster?.

(image)



The end of "personal" computing (and the beginning of personal computing)

2017-06-21T21:00:00Z

(image)

John Allsopp looks toward the next age of personal computing—one where humans will interact with machines in more seamless and futuristic ways.

Continue reading The end of "personal" computing (and the beginning of personal computing).

(image)



Data’s journey to predictive analytics

2017-06-21T21:00:00Z

(image)

Leo Vasiliou walks through the evolution of analytics and how analytics relates to a larger monitoring strategy.

Continue reading Data’s journey to predictive analytics.

(image)



Focusing on what matters

2017-06-21T21:00:00Z

(image)

Tim Kadlec says we must focus on the fundamentals that make or break the web for people around the globe, and down the street.

Continue reading Focusing on what matters.

(image)



Innovating with accessibility in mind

2017-06-21T21:00:00Z

(image)

Marcy Sutton explores how the work we do with technology can have a monumental impact on the lives of people with disabilities.

Continue reading Innovating with accessibility in mind.

(image)



Building the culture and collaboration layer for DevOps

2017-06-21T21:00:00Z

(image)

Sean Regan says DevOps requires more than tools. It also needs high-performing people and teams.

Continue reading Building the culture and collaboration layer for DevOps.

(image)



Highlights from the O'Reilly Fluent Conference in San Jose 2017

2017-06-21T21:00:00Z

Watch highlights covering frontend tools and techniques, performance, web apps, and more. From the O'Reilly Fluent Conference in San Jose 2017.Experts from across the web world are coming together in San Jose, Calif. for the O'Reilly Fluent Conference. Below you'll find links to highlights from the event. The end of "personal" computing (and the beginning of personal computing) John Allsopp looks toward the next age of personal computing—one where humans will interact with machines in more seamless and futuristic ways. Watch "The end of "personal" computing (and the beginning of personal computing)." Building the culture and collaboration layer for DevOps Sean Regan says DevOps requires more than tools. It also needs high-performing people and teams. Watch "Building the culture and collaboration layer for DevOps." Data’s journey to predictive analytics Leo Vasiliou walks through the evolution of analytics and how analytics relates to a larger monitoring strategy. Watch "Data’s journey to predictive analytics." Focusing on what matters Tim Kadlec says we must focus on the fundamentals that make or break the web for people around the globe, and down the street. Watch "Focusing on what matters." Innovating with accessibility in mind Marcy Sutton explores how the work we do with technology can have a monumental impact on the lives of people with disabilities. Watch "Innovating with accessibility in mind." Continue reading Highlights from the O'Reilly Fluent Conference in San Jose 2017.[...]



Perception and bias and metrics, oh my!

2017-06-21T20:00:00Z

(image)

Dawn Parzych shows how understanding assumptions and biases can help your organization.

Continue reading Perception and bias and metrics, oh my! .

(image)



Orchestrating chaos: Applying database research in the wild

2017-06-21T20:00:00Z

(image)

Peter Alvaro explores lineage-driven fault injection (LDFI), a novel approach to automating failure testing.

Continue reading Orchestrating chaos: Applying database research in the wild.

(image)



The role of being technical in technical leadership

2017-06-21T20:00:00Z

(image)

How can you be an effective noncoding technical leader? Camille Fournier explores solutions to this ongoing issue.

Continue reading The role of being technical in technical leadership.

(image)



DevOps and incident management: A recipe for success

2017-06-21T20:00:00Z

(image)

David Hayes explains why DevOps is now a requirement for success and he outlines challenges all DevOps teams will face over the next five years.

Continue reading DevOps and incident management: A recipe for success.

(image)



Future history

2017-06-21T20:00:00Z

(image)

Artur Bergman looks back at the last decade of DevOps and explores shifting patterns in operations, development, and systems.

Continue reading Future history.

(image)



The future works like people

2017-06-21T20:00:00Z

(image)

Adam Jacob says we need to design organizations that can cope with what’s new and what’s next.

Continue reading The future works like people.

(image)



What we learned moving 65,000 Microsofties to DevOps on the public cloud

2017-06-21T20:00:00Z

(image)

Martin Woodward tells the story of transforming Microsoft’s internal engineering systems from a collection of disparate in-house tools to One Engineering System.

Continue reading What we learned moving 65,000 Microsofties to DevOps on the public cloud.

(image)



Achieve predictable performance

2017-06-21T20:00:00Z

(image)

Alex Grbic explains how a single field-programmable gate array (FPGA) can deliver acceleration for multiple workloads.

Continue reading Achieve predictable performance.

(image)



Internet traffic growth: Why platforms are critical for developers

2017-06-21T20:00:00Z

(image)

Corey Scobie explains why the compartmentalization of Internet technology and application development will not sustain our aspirations.

Continue reading Internet traffic growth: Why platforms are critical for developers.

(image)



Highlights from the O'Reilly Velocity Conference in San Jose 2017

2017-06-21T20:00:00Z

Watch highlights covering distributed systems, DevOps, resiliency, and more. From the O'Reilly Velocity Conference in San Jose 2017.Systems and site reliability engineers, architects, and application developers are coming together in San Jose, Calif. for the O'Reilly Velocity Conference. Below you'll find links to highlights from the event. Future history Artur Bergman looks back at the last decade of DevOps and explores shifting patterns in operations, development, and systems. Watch "Future history." What we learned moving 65,000 Microsofties to DevOps on the public cloud Martin Woodward tells the story of transforming Microsoft’s internal engineering systems from a collection of disparate in-house tools to One Engineering System. Watch "What we learned moving 65,000 Microsofties to DevOps on the public cloud." Internet traffic growth: Why platforms are critical for developers Corey Scobie explains why the compartmentalization of Internet technology and application development will not sustain our aspirations. Watch "Internet traffic growth: Why platforms are critical for developers." Orchestrating chaos: Applying database research in the wild Peter Alvaro explores lineage-driven fault injection (LDFI), a novel approach to automating failure testing. Watch "Orchestrating chaos: Applying database research in the wild." Perception and bias and metrics, oh my! Dawn Parzych shows how understanding assumptions and biases can help your organization. Watch "Perception and bias and metrics, oh my!" Achieve predictable performance Alex Grbic explains how a single field-programmable gate array (FPGA) can deliver acceleration for multiple workloads. Watch "Achieve predictable performance." The future works like people Adam Jacob says we need to design organizations that can cope with what’s new and what’s next. Watch "The future works like people." Resiliency in a service provider world Kristopher Beevers asks: What does resiliency mean when service providers are critical components of nearly every application? Watch "Resiliency in a service provider world." DevOps and incident management: A recipe for success David Hayes explains why DevOps is now a requirement for success and he outlines challenges all DevOps teams will face over the next five years. Watch "DevOps and incident management: A recipe for success." The role of being technical in technical leadership How can you be an effective noncoding technical leader? Camille Fournier explores solutions to this ongoing issue. Watch "The role of being technical in technical leadership." Continue reading Highlights from the O'Reilly Velocity Conference in San Jose 2017.[...]



Resiliency in a service provider world

2017-06-21T20:00:00Z

(image)

Kristopher Beevers asks: What does resiliency mean when service providers are critical components of nearly every application?

Continue reading Resiliency in a service provider world.

(image)



Amanda Berlin on defensive security fundamentals

2017-06-21T14:10:00Z

(image)

The O’Reilly Security Podcast: How to approach asset management, improve user education, and strengthen your organization’s defensive security with limited time and resources.

In this episode, I talk with Amanda Berlin, security architect at Hurricane Labs. We discuss how to assess and develop defensive security policies when you’re new to the task, how to approach core security fundamentals like asset management, and generally how you can successfully improve your organization’s defensive security with limited time and resources.

Continue reading Amanda Berlin on defensive security fundamentals.

(image)



An elegant solution to the convex hull problem

2017-06-21T13:59:00Z

An algorithm for rubber-banding random points. 100 Days of Algorithms is a series of Medium posts and Jupyter Notebooks by Tomáš Bouda that implements 100 interesting algorithms. They're a programming exercise that Bouda set for himself: can he implement 100 interesting algorithms, one per day? The answer was “yes.” The algorithms range from classics like Towers of Hanoi to Bloom filters and graph traversal. Over the coming weeks, we’ll be featuring selections from Bouda's 100 Days of Algorithms project here on O’Reilly. Day 28, convex hull Imagine putting a bunch of nails in a wooden board, then stretching a rubber band around all the nails. That's the convex hull problem: given a group of points in a plane, what's the smallest polygon that contains all those points? Here's Bouda's Medium post, and you can access and clone the Jupyter Notebook here. import numpy as np from bokeh.plotting import figure, output_notebook, show algorithm def split(u, v, points): # return points on left side of UV return [p for p in points if np.cross(p - u, v - u) < 0] def extend(u, v, points): if not points: return [] # find furthest point W, and split search to WV, UW w = min(points, key=lambda p: np.cross(p - u, v - u)) p1, p2 = split(w, v, points), split(u, w, points) return extend(w, v, p1) + [w] + extend(u, w, p2) def convex_hull(points): # find two hull points, U, V, and split to left and right search u = min(points, key=lambda p: p[0]) v = max(points, key=lambda p: p[0]) left, right = split(u, v, points), split(v, u, points) # find convex hull on each side return [v] + extend(u, v, left) + [u] + extend(v, u, right) + [v] run points = np.random.rand(100, 2) hull = np.array(convex_hull(points)) hull array([[ 0.9991102 , 0.74387573], [ 0.98512754, 0.91822047], [ 0.79267953, 0.95080755], [ 0.11250518, 0.98983246], [ 0.04098604, 0.9784821 ], [ 0.01458786, 0.89852061], [ 0.00210623, 0.23655309], [ 0.05913608, 0.12453548], [ 0.19229802, 0.0073965 ], [ 0.3678626 , 0.01986249], [ 0.74089924, 0.0571285 ], [ 0.93004227, 0.08858407], [ 0.99371365, 0.52807472], [ 0.9991102 , 0.74387573]]) output_notebook() plot = figure() plot.scatter(x=points[:, 0], y=points[:, 1]) plot.line(x=hull[:, 0], y=hull[:, 1], color='red') show(plot) Technical notes The implementations work; the Jupyter Notebooks all run. Since this started off as a personal exercise, don't expect the implementations to be optimal, bullet-proof, or even necessarily correct (though we don't see anything wrong with them). And don't expect them to contain your favorite algorithms (or the ones you need for your homework assignments). The easiest way to install Jupyter Notebooks is to use Anaconda. The second easiest (and most bulletproof) way is to install Docker and then use the scipy-notebook container. If you're rolling your own Jupyter environment, you need: Python 3.5 (a few of the “days” require 3.6; most will work with 3.4) Jupyter Matplotlib [...]



Jupyter Insights: Marius Tulbure, a developer and JavaScript enthusiast at Figshare

2017-06-21T11:00:00Z

Jupyter for sharing and prototyping, Jupyter in academia, and FAIR principles.Marius Tulbure is a developer and JavaScript enthusiast at Figshare. He will be speaking at JupyterCon, August 22-25, 2017, in New York City. Below, Tulbure shares his thoughts on the current and future state of Jupyter. 1. How has Jupyter changed the way you work? Our platform's back end is implemented using Python; therefore, we adopted Jupyter as a default tool for sharing executable snippets of code and for prototyping small parts of our system. 2. How does Jupyter change the way your team works? How does it alter the dynamics of collaboration? As a team, we started using Jupyter almost everywhere where we needed an implementation technique. We started collaborating on these small pieces of code until we had the plan done; after that, the actual implementation was straight forward. We are also using the notebooks to make the demo and the documentation of our third-party API. 3. How do you expect Jupyter to be extended in the coming year? Jupyter will definitely become one of the main technologies that will change the face of collaboration in big remote teams. Faster and dynamic kernels will become easier and easier to implement. Storage services will be selling cheap servers with preset kernels for Jupyter (or at least that's where I would start investing if I were them). In the coming year, besides more and more plugins for interacting with the UI part of the notebooks, I suspect services that host version control systems will integrate Jupyter as part of their UI and as part of team collaboration.    Another expectation would be the use of Jupyter in academic research. Having shareable, executable, and reproducible data is one of the key points in "pushing" science further.  4. What will you be talking about at JupyterCon? Mark and I will be talking about the FAIR (findable, accessible, interoperable, and reusable) principles that every scientific paper should respect, this will be showcased using Jupyter Notebooks. We will explain what our users waned to see in our platform, how the process of enabling previewable .ipynb files opened a new door for the world of academia—and how we've stepped trough that door by showcasing executable notebooks in what is soon to be a fully fledged digital lab for reproducible data that is built on top of Jupyter. 5. What sessions are you looking forward to seeing at JupyterCon ? I would go to almost all of them, but I am very eager to check out the following: Citing the Jupyter Notebook in the scientific publication process Hosting Jupyter at scale Xeus: A framework for writing native Jupyter kernels  Using Jupyter at the intersection of robots and industrial biology Cloud Datalab: Jupyter with the power of BigQuery and TensorFlow Continue reading Jupyter Insights: Marius Tulbure, a developer and JavaScript enthusiast at Figshare.[...]



2017 Ops Salary Survey

2017-06-21T10:00:00Z

(image)

Get a clear picture of what operations professionals do, what they're paid, how they’re seen within their companies, and how they rate different aspects of their jobs.

The operations (Ops) required to keep an organization’s increasingly important technical infrastructure up and running is a key part of any company. The roles and duties performed by those working in the Ops space vary widely by company, industry, geography, and infrastructure type. This report looks into what operations professionals do, how much they are compensated, how they are seen within their companies, and how they rate different aspects of their jobs.

Continue reading 2017 Ops Salary Survey.

(image)



Four short links: 21 June 2017

2017-06-21T09:55:00Z

CTO Advice, Slurping Citations, Distrust Your Network, Encrypted Yet Insecure Databases

  1. CTO Advice -- When hiring candidates, ask for their operating manual. Tell candidates: “Imagine you're a robot. What does your manual say under 'ideal operating conditions.'” Once they answer, follow-up with this question: “What does the 'warning label' say?” You're likely to get insightful, unpredictable, and humorous answers in this very low-lift way of gauging self-awareness and revealing personality. Lots of really good advice.
  2. pdfx -- a script that pulls citations and references out of a PDF, downloads those references, even pulls the text out of the paper.
  3. Google Releases New BeyondCorp Paper -- their corporate identity and access system, which lets them distrust even their internal network. Nice.
  4. Why Your Encrypted Database Is Not Secure -- Encrypted databases, a popular approach to protecting data from compromised database management systems (DBMS’s), use abstract threat models that capture neither realistic databases, nor realistic attack scenarios.

Continue reading Four short links: 21 June 2017.

(image)



Under the hood of machine learning

2017-06-20T19:58:00Z

Exploring a reference architecture solution.The idea of “rational” machines has always been part of the human imagination. As the field of artificial intelligence (AI) advanced, computational tools became more sophisticated, and specific applications of AI, such as machine learning, evolved. Machine learning transforms business Machine learning revolves around the idea that we should be able to give machines access to data and let them learn for themselves. It arose within the interesting confluence of emergence of big data, cheap and powerful computational processing, and more efficient data storage. From banking to health care to retail, machine learning is revolutionizing the way we do business. Whether it is used for detecting fraud, identifying patterns in trading, or recommending a new product based on real-time information processing, the potential for this burgeoning field is vast. Many industries recognize that real-time insights into big data make them more efficient and differentiate them from their competitors. And ignoring the data carries a hefty price tag: PayPal reported losing $10 million a month to hackers until they implemented machine learning to detect fraudulent patterns. It is no surprise that machine learning is on the top of every IT department’s priority list for long-term investment. IT departments quickly realized, however, that while machine learning as a field has exploded, the landscape of tools, technology, and infrastructure to power these applications is confusing and fragmented. It is not easy to manage all the servers and connect services in a way that can be scaled when needed. The right tools for the job Companies that want to deliver new services with data insights often find it difficult to capture and process their big data. For instance, the machine learning tools must integrate easily with the software platforms that support existing business processes, users, and diverse projects. The tools must also interface with many different data platforms and handle structured, semi-structured, and unstructured data. Lastly, the tools must integrate with the company’s preferred technology stack. In the past few years alone, a plethora of tools has emerged to facilitate machine learning, including a broad set of container and big data technologies, such as distributed databases, message queues, and real-time analytics engines. Analysts might require access to Hadoop for batch processing analytics, Spark for processing data in real time, Kafka for near real-time messaging, and Cassandra as a fast, scalable data store for high-volume web applications. Each of these systems and services is complicated in its own right. And within each category, there are many options: various solutions and features, each with their own merits and suited for a different purpose. Yet, all of the technologies involved must be able to work together and cooperate when needed. IT departments find themselves orchestrating data processing tools, data stores, integration, distributed computing primitives, cluster managers and task schedulers, deployment, configuration management, data analytics, and machine learning tools. A reference architecture solution What does a reference architecture [...]



What is Jupyter?

2017-06-20T14:30:00Z

To succeed in digital transformation, businesses need to adopt tools that enable collaboration, sharing, and rapid deployment. Jupyter fits that bill.What is Jupyter, and why do you care? After all, Jupyter has never become a buzzword like data science, artificial intelligence, or Web 2.0. Unlike those big abstractions, Jupyter is very concrete. It’s an open source project, a piece of software, that does specific things. But without attracting the hype, Jupyter Notebooks are revolutionizing the way engineers and data scientists work together. If all important work is collaborative, the most important tools we have are tools for collaboration, tools that make working together more productive. That's what Jupyter is, in a nutshell: it's a tool for collaborating. It’s built for writing and sharing code and text, within the context of a web page. The code runs on a server, and the results are turned into HTML and incorporated into the page you're writing. That server can be anywhere: on your laptop, behind your firewall, or on the public internet. Your page contains your thoughts, your code, and the results of running the code. Code is never just code. It's part of a thought process, an argument, even an experiment. This is particularly true for data analysis, but it's true for almost any application. Jupyter lets you build a "lab notebook" that shows your work: the code, the data, the results, along with your explanation and reasoning. As IBM puts it, Jupyter lets you build a "computational narrative that distills data into insights." Data means nothing if you can't turn it into insight, if you can't explore it, share it, and discuss it. Data analysis means little if you can't explore and experiment with someone else's results. Jupyter is a tool for exploring, sharing, and discussing. A notebook is easily shareable. You can save it and send it as an attachment, so someone else can open the notebook with Jupyter. You can put the notebook in a GitHub repository and let others read it there; GitHub automatically renders the notebook to a static web page. GitHub users can download (clone) their own copy of the notebook and any supporting files so they can expand on your work: they can inspect the results, modify the code, and see what happens. It's a lot easier to maintain an up-to-date archive on GitHub than to hand distribute your code, data, supporting files, and results. You can go further by using container technology, such as Docker, to package your notebook, a notebook server, any libraries you need, your data, and a stripped-down operating system, into a single downloadable object. Sharing can be as public as you want. You can run a Jupyter server on your laptop, largely inaccessible to anyone else. You can run a multi-user Jupyter server, JupyterHub, behind your corporate firewall. You can even push Jupyter Notebooks into the cloud. GitHub and GitLab (a host-it-yourself git server) automatically convert notebooks into static HTML for access over the web, and platforms like Binder allow others to run your code in the cloud. They can experiment with it and modify it, all within the context of a private instance. While Jupyter's roots are in Python (it evolved from IPython Notebooks), it is now multi-lingual. The name itsel[...]



Four short links: 20 June 2017

2017-06-20T10:20:00Z

Dynamic Processes, Hardware Upgrades, Social Cooling, and RNC Data

  1. Close-Up View of DNA Replication Yields Surprises -- Conventional wisdom is that the polymerases on the leading and lagging strands are somehow coordinated so that one does not get ahead of the other. If that did happen, it would create stretches of single-stranded DNA that are highly susceptible to damaging mutations. Instead, what looks like coordination is actually the outcome of a random process of starting, stopping, and variable speeds. Over time, any one DNA polymerase will move at an average speed; look at a number of DNA polymerases synthesizing DNA strands over time, and they will have the same average speed.
  2. Hardware Is the New Software -- Microsoft researcher hypothesizes that Intel is releasing new features in chips at a faster rate because the end of Moore's Law means the end of reasons to keep upgrading CPUs. The graphs are great. (via Adrian Colyer)
  3. Social Cooling -- People are changing their behavior to get better scores. [...] Social Cooling describes the long-term negative side effects of living in a reputation economy.
  4. Inside the RNC Data Leak (Upguard) -- anyone with an internet connection could have accessed the Republican data operation used to power Donald Trump’s presidential victory, simply by navigating to a six-character Amazon subdomain: “dra-dw”. Interesting not just for this, but also for the glimpse at the CSV files.

Continue reading Four short links: 20 June 2017.

(image)



Special snowflakes in SEO

2017-06-20T10:00:00Z

(image)

A story about SEO that wouldn't work and how social media saved the project.

    Once we worked with a client that had the opportunity to create a new name for their flagship product after an acquisition. The only stipulation they had was that they had to change their company name completely. They were allowed to use virtually any name they wanted. This meant we were working with a new website that stood alone. We needed this page to be able to stand and make sense on it's own, but also appear as part of a corporation (with their other brands). There were almost too many directives for us to parse, and lots of confusion at the top. So we were told by the company that the site simultaneously needed to stand apart, but be recognized as part of a bigger brand. An attitude of entitlement doesn't achieve winning results; it drains resources. This project was an interesting case of woeful ignorance, with many folks on the marketing team out of touch. Ambition is generally a positive force for a program, but not when the champagne wishes exceed SEO dreams.

The client chose a single word for their new name, which was also being used by three other companies. When we were discussing domain names, I warned that the SERPs themselves were confused about what to serve up for this particular branded query. The SERPs showed all three of the aforementioned companies first. The name was also a common word used in normal day-to-day speech.

Continue reading Special snowflakes in SEO.

(image)



Four short links: 19 June 2017

2017-06-19T10:25:00Z

Telco Exploits, Property-Based Testing, Open Textbooks, and Energy Futures Fiction

  1. SigPloit -- a signaling security testing framework dedicated to Telecom Security professionals and reasearchers to pentest and exploit vulnerabilites in the signaling protocols used in mobile operators. It's not comforting to think of telcos as being run on a bunch of insecure protocols for which there are exploits everywhere. Then again, if that thought disturbs you, don't read up on BGP.
  2. Hypothesis -- lets you write tests which instead look like this: For all data matching some specification, perform some operations on the data; assert something about the result. This is [...] property-based testing.
  3. University of Minnesota's Open Textbook Library -- great collection across many subject areas. For nerds like me who like to curl up with a textbook in front of the fire.
  4. Telling Tomorrows: Science Fiction as Energy Futures Research Tool -- This paper makes a case for the utility of prose science fiction both as a methodological tool of representation and portrayal for energy futures research that meets these criteria, and as a storehouse of tools and strategies for the critique of energy futures. Because if someone can read a bunch of science fiction for their day job and get a publication of their own out of it, I'm going to link to that paper.

Continue reading Four short links: 19 June 2017.

(image)



Intelligent Bits: 16 June 2017

2017-06-16T15:00:00Z

(image)

AI fighting extremism, intuitive physics, and schema networks.

  1. Facebook fighting extremism with AI — “The problem, as usual, is determining what is extremist, and what isn’t, and it goes further than just jihadists,” he said. “Are they just talking about ISIS and Al Qaeda, or are they going to go further to deal with white nationalism and neo-Nazi movements?”
  2. AI is big business — Element AI raises a whopping $102 million to bridge the gap between the haves and have-nots of AI.
  3. “Intuitive physics” — DeepMind claims progress towards AI with a better sense of context and “intuitive physics” via relational reasoning and visual prediction, but obstacles to human-like intelligence remain.
  4. Alternative schema — While deep reinforcement learning (DRL) is all the rage right now, some organizations like Vicarious have taken alternative approaches such as their Schema Networks, which have outperformed some DRL nets albeit with some debate and controversy.

Continue reading Intelligent Bits: 16 June 2017.

(image)



Design context for the bot revolution

2017-06-16T11:00:00Z

Designers will need to explore use cases—bots are a great hammer, but not everything is a nail.Bots are going to disrupt the software industry in the same way the web and mobile revolutions did. History has taught us that great opportunities arise in these revolutions: we’ve seen how successful companies like Uber, Airbnb, and Salesforce were created as a result of new technology, user experience, and distribution channels. At the end of this book, I hope you will be better prepared to grab these opportunities and design a great product for this bot revolution. Our lives have become full of bots in 2017 —I wake up in the morning and ask Amazon’s Alexa (a voice bot by Amazon) to play my favorite bossa nova, Amy (an email bot by x.ai) emails me about today’s meetings, and Slackbot (a bot powered by Slack) sends me a notification to remind me to buy airline tickets to NYC today. Bots are everywhere! There is a lot of talk about bots these days, and a lot of misconceptions. In order to demystify these misconceptions, let’s start by providing some of the history of bots and defining bots—what they do and why they are important. I wrote my first bot 16 years ago. I was an engineer at a company that provided SMS infrastructure that was about to be deployed in Europe. You can imagine that testing if texting works in a network of one (as the system was not online yet, I was the only one on the network) is a very lonely experience. So, I created a small program to answer my texts. It started as a bot that repeated everything I said—I would text “hello” and get a “hello” back—but that became boring really fast. I started adding a persona to the bot, adding funny sentences I heard in the office. At the end I had two personas I was chatting with constantly, “Bob” and “Samantha.” I kept growing their vocabulary and skills and found it extremely therapeutic to converse with them via text. But bots go way back to the 1950s, when computer scientist Alan Turing contemplated the concept of computers communicating like humans. Turing developed the Turing Test to test a computer’s ability to display intelligent behavior equivalent to that of a human. A user had to distinguish a conversation with a human from a conversation with a computer, and if they failed to do so, then the computer would have passed the Turing Test. Alan Turing was one of the fathers of computer science, and we still refer to the Turing Test when we talk about intelligent bots. One of the best-known bots from the past was Eliza. Developed by Joseph Weizenbaum in 1964 for the IBM 7094, Eliza was a psychotherapist bot that talked to users about their problems, invoking strong emotional reactions in many users even though it was clear they were interacting with a bot and not a human. So, what are bots? At a very basic level, bots are a new user interface. This new user interface lets users interact with services and brands using their favorite messaging apps. Bots are a new way to expose software services through a conversational interface. Bots are also referred to as cha[...]



Four short links: 16 June 2017

2017-06-16T10:05:00Z

Maciej Interview, GPU Visualization, Games Replacing Jobs, and History of Privacy

  1. Maciej Ceglowski on Why Fandom is Good for Business -- I didn't realize that it was Britta Gustafson, the former community manager of Delicious (from its glory days), who made him appreciate fandom. He's returned the favour: she now has the keys to the Delicious social media accounts.
  2. Stardust -- GPU-based Visualization Library.
  3. Young Men Are Playing Video Games Instead of Getting Jobs -- Even as the unemployment rate has dropped, labor force participation—the number of people who either work or want to work—has dwindled. In particular, young men without college degrees have become increasingly detached from the labor market. And what they appear to be doing instead is playing video games. [...] A young life spent playing video games can lead to a middle age without marketable skills or connections. "There is some evidence," Hurst pointed out, "that these young, lower-skilled men who are happy in their 20s become much less happy in their 30s or 40s."
  4. History of Privacy in 50 Images -- fascinating! Despite some high-profile opposition, the first American Census was posted publicly, for logistics reasons, more than anything else. Transparency was the best way to ensure every citizen could inspect it for accuracy.

Continue reading Four short links: 16 June 2017.

(image)



Animating movement with translate3d

2017-06-16T10:00:00Z

(image)

Reliably smooth animations with help from your GPU.

When animating elements to make them move around the screen, you want the animation to look smooth. You want it to run at 60 frames per second and run without showing any jittering or tearing. All of that seems straightforward, but it becomes less so when you think about all the various devices your content will be viewed on:

(image)

Continue reading Animating movement with translate3d.

(image)



The evolution of scalable microservices

2017-06-15T19:30:00Z

(image)

From building microliths to designing reactive microsystems.

Today’s enterprise applications are deployed to everything from mobile devices to cloud-based clusters running thousands of multi-core processors. Users have come to expect millisecond response times and close to 100% uptime. And by “user” I mean both humans and machines. Traditional architectures, tools and products simply won’t cut it anymore. To paraphrase Henry Ford’s classic quote: we can’t make the horse any faster, we need cars for where we are going.

In this article, we will look at microservices, not as a tool to scale the organization, development and release process (even though it's one of the main reasons for adopting microservices), but from an architecture and design perspective, and put it in its true context: distributed systems. In particular, we will discuss how to leverage Events-first Domain Driven Design and Reactive principles to build scalable microservices, working our way through the evolution of a scalable microservices-based system.

Continue reading The evolution of scalable microservices.

(image)



Adopting AI in the enterprise: General Electric

2017-06-15T16:25:00Z

Karley Yoder on what GE Healthcare has learned as it embraces artificial intelligence.As artificial intelligence technology has advanced, nearly every company has begun to respond to the promise—or threat—of AI in its industry. This post is the start of a series of interviews with executives from companies outside of the traditional boundaries of Silicon Valley. We'll talk to them about the ways they're approaching AI strategies and how they are leveraging the latest approaches in machine learning to provide the best products for their customers. In our first interview, we hear from Karley Yoder, senior product manager, advanced analytics at GE Healthcare. Over the last seven or so years, GE has undergone a digital transformation—repositioning itself from a traditional manufacturing company into a software, services, and manufacturing company. For companies that aim to transform themselves through software and analytics, an understanding of machine learning and artificial intelligence will be essential. Our interview has been lightly edited for clarity. How do you leverage AI and ML to create a better product? At GE Healthcare, we work closely and consistently with customers to understand their greatest pain points and ensure our products address those needs. It’s important to note that AI by itself is not a product but a powerful enabler to create better products. We leverage the technology to create products that benefit from the constant learning and improvement inherent in AI, turning the data into insights that have the potential to improve the quality and efficiency of care. What steps have you needed to take in order to build a team that could grasp and apply recent advances in AI and ML? GE has been a leader in AI for decades through the accolades and products produced by our Global Research Center. Our current team leverages these capabilities and resources, but we also aggressively recruited fresh, bright data science minds to complement this legacy knowledge. As a traditional hardware company undergoing a massive digital transformation, we have committed billions of dollars to building out our digital and AI competencies through our industrial IoT platform called Predix. Are there other use cases within GE today? GE is using AI to solve the world’s most pressing problems across all our industrial business lines. In health care, we are partnering with top medical institutions (UC San Francisco, Boston Children’s Hospital, Massachusetts General Hospital, and Brigham and Women’s Hospital) to create a library of deep learning algorithms that have the potential to improve quality, increase access, and reduce the cost of care around the world. The library will initially focus on diagnostic imaging applications, such as one that could identify pneumothorax—a critical condition of a collapsed lung—and prioritize that case in a clinician’s worklist, so the patient can receive more timely intervention. Over time, the library will include applications that address mul[...]



Well-designed customer conversations will fuel a successful business

2017-06-15T13:14:00Z

(image)

The way we build and develop digital products and services needs to change.

Continue reading Well-designed customer conversations will fuel a successful business.

(image)



Ben Evans on Java 9

2017-06-15T11:30:00Z

(image)

The O’Reilly Programming Podcast: Thoughts on performance, modularity, and what’s next for Java.

In this episode of the O’Reilly Programming Podcast, I talk with Ben Evans, co-founder and technology fellow at JClarity, and co-author of the forthcoming O’Reilly book Optimizing Java: Practical Techniques for Improved Performance Tuning. We discuss the upcoming release of Java 9, Java performance issues, and Evans’ experience as an organizer for the London Java Community.

Continue reading Ben Evans on Java 9.

(image)



Programming collective intelligence for financial trading

2017-06-15T11:05:00Z

(image)

The O’Reilly Data Show Podcast: Geoffrey Bradway on building a trading system that synthesizes many different models.

In this episode of the Data Show, I spoke with Geoffrey Bradway, VP of engineering at Numerai, a new hedge fund that relies on contributions of external data scientists. The company hosts regular competitions where data scientists submit machine learning models for classification tasks. The most promising submissions are then added to an ensemble of models that the company uses to trade in real-world financial markets.

Continue reading Programming collective intelligence for financial trading.

(image)



Four short links: 15 June 2017

2017-06-15T10:50:00Z

Positive Design Fiction, Gray Failure, OMGLOLWTF Blockchain, and AI Negotiations

  1. Various Sci Fi Projects Allegedly Creating a Better Future (Bruce Sterling) -- he's written for a lot of "imagine a better future" attempts counter to what seems to be a world lurching toward dystopia. The “better future” thing is jam-tomorrow and jam-yesterday talk, so it tends to become the enemy of jam today. You’re better off reading history and realizing that public aspirations that do seem great, and that even meet with tremendous innovative success, can change the tenor of society and easily become curses a generation later. Not because they were ever bad ideas or bad things to aspire to or do, but because that’s the nature of historical causality. Tomorrow composts today. (via Cory Doctorow)
  2. Gray Failure (PDF) -- component failures, whose manifestations are fairly subtle and thus defy quick and definitive detection. Examples of gray failure are severe performance degradation, random packet loss, flaky I/O, memory thrashing, capacity pressure, and non-fatal exceptions. [...] Our first-hand experience with production cloud systems reveals that gray failure is behind most cloud incidents. (via Adrian Colyer)
  3. Daisy: A Private Blockchain Where Blocks Are SQLite Databases, in Go -- as one Hacker News commenter described it: Everything about this feels like the most terrible idea ever, but in such a fascinating way. It's beautiful.
  4. Facebook's Negotiating AIs -- The FAIR researchers' key technical innovation in building such long-term planning dialog agents is an idea called dialog rollouts. Build a tree of possible conversation paths, and pick the one that has the greatest chance of success by simulating all those possible conversations. There were cases where agents initially feigned interest in a valueless item, only to later “compromise” by conceding it—an effective negotiating tactic that people use regularly. This behavior was not programmed by the researchers but was discovered by the bot as a method for trying to achieve its goals.

Continue reading Four short links: 15 June 2017.

(image)



Privacy and threat in practice

2017-06-15T10:00:00Z

(image)

Exploring the disconnect between security wisdom and user realities.

Continue reading Privacy and threat in practice.

(image)



The evolution of data center networks

2017-06-15T10:00:00Z

(image)

Five Questions for Dinesh Dutt on the changing relationship between network and computer.

I recently sat down with Dinesh Dutt, chief scientist at Cumulus Networks, to discuss how data centers have changed in recent years, new tools and techniques for network engineers, and what the future may hold for data center networking. Here are some highlights from our talk.

How have data centers evolved over the past few years?

Modern data centers have come a long way from when I first began working on them in 2007. Pioneers such as Google and Amazon started a trend that many others now try to emulate.

Continue reading The evolution of data center networks.

(image)



When will automation reach your industry?

2017-06-14T11:45:00Z

(image)

How to understand machine learning adoption in the enterprise.

Continue reading When will automation reach your industry?.

(image)



Towers of Hanoi: Every CS student’s introduction to recursion

2017-06-14T11:00:00Z

Ring stacking games. With computers. 100 Days of Algorithms is a series of Medium posts and Jupyter Notebooks by Tomáš Bouda that implement 100 interesting algorithms. They're a programming exercise that Tomáš set for himself: can he implement 100 interesting algorithms, one per day? The answer was “yes.” The algorithms range from classics like Towers of Hanoi to Bloom filters and graph traversal. Over the coming weeks, we’ll be featuring selections from Tomáš’ 100 Days of Algorithms project here on O’Reilly. Towers of Hanoi (Day 1) Everybody knows the towers of Hanoi: it's a staple in introductory programming courses. You have three poles, one with a set of rings stacked on it, from the largest to the smallest. (Think the common ring stacking toy.) Move the rings one at a time from the left pole to the right one without ever placing a larger ring on top of a smaller ring. The solution is shockingly short and elegant. Here's Tomáš’ Medium post on the towers of Hanoi. Below you’ll find a static version of his Jupyter Notebook. The notebook can be accessed and cloned here. algorithm def hanoi(height, left='left', right='right', middle='middle'): if height: hanoi(height - 1, left, middle, right) print(left, '=>', right) hanoi(height - 1, middle, right, left) run hanoi(1) left => right hanoi(2) left => middle left => right middle => right hanoi(3) left => right left => middle right => middle left => right middle => left middle => right left => right Technical notes The implementations work; the Jupyter Notebooks all run. Since this started off as a personal exercise, don't expect the implementations to be optimal, bullet-proof, or even necessarily correct (though we don't see anything wrong with them). And don't expect them to contain your favorite algorithms (or the ones you need for your homework assignments). The easiest way to install Jupyter Notebooks is to use Anaconda. The second easiest (and most bulletproof) way is to install Docker and then use the scipy-notebook container. If you're rolling your own Jupyter environment, you need: Python 3.5 (a few of the “days” require 3.6; most will work with 3.4) Jupyter Matplotlib NumPy SciPy Bokeh NetworkX Continue reading Towers of Hanoi: Every CS student’s introduction to recursion.[...]



Jupyter Insights: Paco Nathan, leader of the Learning Group at O'Reilly Media

2017-06-14T11:00:00Z

Giving context to code, human-in-the-loop design pattern, and collaborative documents.Paco Nathan leads the Learning group at O’Reilly Media. Known as a “player/coach” data scientist, Nathan led innovative data teams building ML apps at scale for several years and more recently was evangelist for Apache Spark, Apache Mesos, and Cascading. Below, Nathan shares his thoughts on the current and future state of Jupyter. He will also be speaking at JupyterCon, August 22-25, 2017, in New York City. 1. How has Jupyter changed the way you work? Having repeatable work that can be packaged and shared with others provides an enormous boost for how we work together. Jupyter Notebooks give context to the code. When you share your work, you're not just sharing some source files that have to be deciphered; you're sharing a whole train of thought, which makes it much easier to see what's going on. This isn't just true for your team members; it's equally true if you drop something and come back six months later. 2. How does Jupyter change the way your team works? How does it alter the dynamics of collaboration? I think the human-in-the-loop design pattern for how we manage a large set of ML pipelines at O'Reilly makes possible some of our use of AI applications that wouldn't be manageable otherwise. Using the nbtransom package, we've essentially made the machine another collaborator on a set of notebooks. We use several different algorithms to score documents; when the algorithms give different results, we send the results to a human for resolution. Keeping the process within Jupyter Notebooks makes it much more convenient and efficient. The dynamics are about people sharing this work within a team, but then maybe machines are doing 80-90% of the work, and those machines are also collaborating on documents via Jupyter. 3. How do you expect Jupyter to be extended in the coming year? Collaborative documents is the big area I'm looking forward to. There have already been experiments with integrating notebooks and Google docs, and we're looking forward to having full collaboration (multiple authors working on a notebook simultaneously) in an upcoming version of JupyterHub. That would make our human-in-the-loop process even more efficient.  4. What will you be talking about at JupyterCon? My talk is on the general theme of Jupyter as a front end for AI—in a couple ways. One, mentioned above, is where we have an "active learning" design pattern for human-in-the-loop ML pipelines. Another is where we're starting to build out conversational interfaces that leverage Jupyter network protocol. What sessions are you looking forward to seeing at JupyterCon? I especially want to lea[...]



Four short links: 14 June 2017

2017-06-14T10:10:00Z

Reducing Bias, Autonomous Shipping, Control Systems Malware, and Program Management

  1. 7 Practical Ways to Reduce Bias in Your Hiring Process -- nothing new, but it's nice to have it in a box to point your management at.
  2. Autonomous Ships -- the first commercial vessel to navigate entirely by itself could be a harbor tug or a ferry designed to carry cars the short distance across the mouth of a river or a fjord, and it or similar ships will be in commercial operation within the next few years. And we expect fully autonomous oceangoing cargo ships to be routinely plying the world’s seas in 10 or 15 years’ time.
  3. WIN32/INDUSTROYER (PDF) -- report on control systems malware. As described in this Wired article.
  4. When Your Startup Needs Program Management -- first time I'd encountered the Driver, Approver, Contributor, Informed (DACI) model.

Continue reading Four short links: 14 June 2017.

(image)



Database reliability engineering

2017-06-14T10:00:00Z

(image)

Five Questions for Laine Campbell about building dependable databases.

I recently sat down with Laine Campbell, principal consultant at OpsArtisan, to talk about the practice of database reliability engineering and ways that DBAs can build their expertise in this area. Here are some highlights from our chat.

How would you define "database reliability engineering"?

The practice of reliability engineering—focused on automation, removal of toil, and approaching systems and operational processes—applies just as strongly to the database tiers as to the application and web tiers. Today's database professionals must be engineers, not administrators. We build things. We create things. We are all in this together, and nothing is someone else’s problem. As engineers, we apply repeatable processes, established knowledge, and expert judgment to design, build, and operate production data stores and the data structures within. As database reliability engineers, we must take the operational principles and the depth of database expertise that we possess one step further.

Continue reading Database reliability engineering.

(image)



Web linking for frontend security and speed

2017-06-13T19:00:00Z

(image)

Techniques for securely improving page performance.

Web linking is a technique aimed at improving the frontend user experience by delivering prioritized resources faster to the end user through the use of the Link header or tag. The Link technique provides the rel attribute with various options including dns-prefetch, preconnect, prerender, prefetch, and preload. While all of these techniques improve page load performance, we will focus on prefetch and preload.

Prefetch and Preload

The prefetch technique, as shown in tag and header, forces the browser to load low-priority resources that might be needed on the next page navigation. While this technique should be used with caution, we can see how the frontend user experience is improved with predetermined resource downloads and faster navigation page load.

Continue reading Web linking for frontend security and speed.

(image)



Four short links: 13 June 2017

2017-06-13T10:10:00Z

Drone Energy, Open Speech Data, Predicting Suicide, and Designing Amidst Algorithms

  1. Drone Energy Sources -- what to look for, what the choices are, who's doing interesting work.
  2. Can You Help Me Gather Open Speech Data? (Peter Warden) -- I’ve put together a website that asks you to speak about 100 words into the microphone, records the results, and then lets you submit the clips. I’m then hoping to release an open source data set out of these contributions, along with a TensorFlow example of a simple spoken word recognizer.
  3. Predicting Suicide Accurately -- the paper (use sci-hub for access) is interesting. This set of more than 5,000 cases was used to train the machine to identify those at risk of attempted suicide compared to those who committed self-harm but showed no evidence of suicidal intent. The researchers also built algorithms to predict attempted suicide among a group of 12,695 randomly selected patients with no documented history of suicide attempts. It proved even more accurate at making suicide risk predictions within this large general population of patients admitted to the hospital. Now the question becomes: how do we use this so as to minimize damage with false positives and false negatives, as well as true positives and negatives.
  4. Design in the Era of the Algorithm (Josh Clark) -- The design and presentation of data is just as important as the underlying algorithm. Algorithmic interfaces are a huge part of our future, and getting their design right is critical—and very, very hard to do. My work has begun to turn to the responsible and humane presentation of data-driven interfaces. And I suspect that yours will, too, in very short order. While constructing these machine learning models is indeed heavy-duty data science, using them is not. Tons of these machine learning models are available to all of us here to build upon right now.

Continue reading Four short links: 13 June 2017.

(image)



Hallmarks of a good technical leader

2017-06-13T10:00:00Z

(image)

Five Questions for Camille Fournier about the challenges engineers face when transitioning to managers, and how to foster great technical leadership.

I recently sat down with Camille Fournier, the head of Platform Engineering at Two Sigma, to talk about what constitutes great technical leadership and how organizations can foster it. Here are some highlights from our chat.

How do you define technical leadership (as opposed to leadership in general)?

Technical leaders don’t just generically inspire people to do things, but are capable of communicating with technical stakeholders and engineers in language that they understand. Technical leadership is about understanding the technical context under which decisions are being made, and asking questions to help make sure the right decisions are being made given the technical concerns.

Continue reading Hallmarks of a good technical leader.

(image)



Four short links: 12 June 2017

2017-06-12T10:25:00Z

Modern Web Spellbook, GPU Gap, Measure What Matters, and Educational Robotics Toy Spellbook of Modern Web Dev -- This document originated from a bunch of most-commonly used links and learning resources I sent to every new web developer on our full-stack web development team. For each problem domain and each technology, I try my best to pick only one or a few links that are most important, typical, common, or popular and not outdated, based on clear trends, public data, and empirical observation. How AI Can Keep Accelerating After Moore’s Law -- answer: GPUs and innovation therein. Nvidia CEO Jensen Huang displayed a chart showing how his chips’ performance has continued to accelerate exponentially while growth in the performance of general purpose processors, or CPUs, has slowed. Doug Burger, a distinguished engineer at Microsoft’s NExT division that works on commercializing new technology, says a similar gap is opening between conventional and machine learning software. “You’re starting to see a [performance] plateau for general software—it has stopped improving at historical rates—but this AI stuff is still increasing rapidly,” he says. Also: Google's machine learning how-to-optimize-machine-learning result would cost you $250K to reproduce on Amazon GPUs. Gamified Wikipedia Tutorial Didn't Change Participation Rates (Benjamin Mako Hill) -- To our surprise, we found that, in both cases, there were no significant effects on any of the outcomes of interest. Being invited to play the Wikipedia Adventure, therefore, had no effect on new users’ volume of participation either on Wikipedia in general, or on talk pages specifically, nor did it have any effect on the average quality of edits made by the users in our study. Despite the very positive feedback that the system received in the survey evaluation stage, it did not produce a significant change in newcomer contribution behavior. We concluded that the system by itself could not reverse the trend of newcomer attrition on Wikipedia. A reminder that you should, as Mako Hill did, measure the behaviour you care about, not how much people enjoyed your intervention. Sony Toio -- the result of five years of research into developing a toy that’s simple enough for kids to use, but also sophisticated enough to create a figurative sandbox where kids can explore the inner workings of robotics engineering. Continue reading Four short links: 12 June 2017.[...]



Intelligent Bits: 9 June 2017

2017-06-09T15:20:00Z

(image)

Drawing with AI, Apple AI API, United Nations and AI for good, and smart oil and gas.

  1. Teaching machines to draw — David Ha and other researchers at Google demonstrate neural networks that can produce vector drawings of concepts they’ve learned, much like a child who has learned how to sketch ideas.
  2. Apple AI APIs — Apple joins the fray and now offers API tools to developers for building AI applications, claiming superior performance and privacy benefits.
  3. AI for good — Leaders from United Nations agencies, NGOs, government, industry and academia convene in Geneva this week at the AI for Good Global Summit.
  4. Space-grade AI for oil and gas — Big Oil gets into big data and AI as BP Ventures invests in startup Beyond Limits, which applies cognitive computing to industrial verticals like oil and gas.

Continue reading Intelligent Bits: 9 June 2017.

(image)



How do I run an Apache Spark script on an Amazon Elastic MapReduce (EMR) cluster?

2017-06-09T08:00:00Z

(image)

Learn how to use steps in the EMR console to schedule and run Spark scripts stored in Amazon S3, on both new and existing clusters.

Continue reading How do I run an Apache Spark script on an Amazon Elastic MapReduce (EMR) cluster?.

(image)



How do I package a Spark Scala script with SBT for use on an Amazon Elastic MapReduce (EMR) cluster?

2017-06-09T08:00:00Z

(image)

Learn how to create, structure, and compile your Scala script to a JAR file, and use SBT to run on a distributed Spark cluster.

Continue reading How do I package a Spark Scala script with SBT for use on an Amazon Elastic MapReduce (EMR) cluster?.

(image)



Four short links: 9 June 2017

2017-06-09T08:00:00Z

Text Analysis, Specific Phones, AI Copyright, and Minecraft for R scattertext -- fun tool for finding distinguishing terms in small-to-medium-sized corpora. (via Lynn Cherny on Twitter) Shanzhai Archaeology (We Make Money Not Art) -- counterfeit consumer goods, sold at lower prices and boasting multifunctional performance, targeted at particular audiences. My favourite: The Power Bank Phone: Ghana is currently going through a major power grid crisis: blackouts in the city can last for 36 hours on end. As a result, a significant business activity has grown around the sale of portable USB chargers that can charge electronic devices or even power bulbs. The Power Bank Phone, designed for this particular market, combines a 10000 Mh USB charger, an LED flashlight, and 3 sim card slots to connect the entire family or to take advantage of promotions offered by different operators. Do Androids Dream of Electric Copyright? Comparative Analysis of Originality in Artificial Intelligence Generated Works -- paper on the vexing topic of copyright in works generated by AI. Modern copyright law has been drafted to consider originality as an embodiment of the author’s personality, and originality is one of the main requirements for the subsistence of copyright. So, what happens when you remove personality from the equation? Are machine-created works devoid of copyright? Do we need to change copyright law to accommodate autonomous artists? R Interface to Minecraft -- a project to interface the R language with Minecraft. The resulting R package, miner, is now available to install from Github. The goal of the package is to introduce budding programmers to the R language via their interest in Minecraft, and to that end there's also a book (R Programming with Minecraft) and associated R package (craft) under development to provide lots of fun examples of manipulating the Minecraft world with R. Continue reading Four short links: 9 June 2017.[...]



How do I configure Apache Spark on an Amazon Elastic MapReduce (EMR) cluster?

2017-06-09T08:00:00Z

(image)

Learn how to manage Apache Spark configuration overrides for an AWS Elastic MapReduce cluster to save time and money.

Continue reading How do I configure Apache Spark on an Amazon Elastic MapReduce (EMR) cluster?.

(image)



Cynthia Savard Saucier on design at Shopify

2017-06-08T11:25:00Z

(image)

The O’Reilly Design Podcast: The sombrero-shaped designer, leading design teams, and designing for retail.

This week, I sit down with Cynthia Savard Saucier, director of design at Shopify and author of Tragic Design. Saucier also is keynoting at Velocity in New York, October 1-4, 2017. We talk about moving from working in design to leading designers, the real and sometimes negative impact that design decisions can have on users, and how design is organized at Shopify.

Continue reading Cynthia Savard Saucier on design at Shopify.

(image)