Subscribe: O'Reilly Radar - Insight, analysis, and research about emerging technologies
http://radar.oreilly.com/feed
Added By: Feedage Forager Feedage Grade B rated
Language: English
Tags:
continue reading  continue  data  learning  links november  links  new  reading  security  short links  short  software  spark 
Rate this Feed
Rate this feedRate this feedRate this feedRate this feedRate this feed
Rate this feed 1 starRate this feed 2 starRate this feed 3 starRate this feed 4 starRate this feed 5 star

Comments (0)

Feed Details and Statistics Feed Statistics
Preview: O'Reilly Radar - Insight, analysis, and research about emerging technologies

All - O'Reilly Media



All of our Ideas and Learning material from all of our topics.



Updated: 2017-11-20T00:39:50Z

 



Four short links: 17 November 2017

2017-11-17T09:00:00Z

Interactive Marginalia, In-Person Interactions, Welcoming Groups, and Systems Challenges

  1. Interactive Marginalia (Liza Daly) -- wonderfully thoughtful piece about web annotations.
  2. In-Person Interactions -- Casual human interaction gives you lots of serendipitous opportunities to figure out that the problem you thought you were solving is not the most important problem, and that you should be thinking about something else. Computers aren't so good at that. So true! (via Daniel Bachhuber)
  3. Pacman Rule -- When standing as a group of people, always leave room for 1 person to join your group. (via Simon Willison)
  4. Berkeley View of Systems Challenges for AI -- In this paper, we propose several open research directions in systems, architectures, and security that can address these challenges and help unlock AI’s potential to improve lives and society.

Continue reading Four short links: 17 November 2017.

(image)



Four short links: 16 November 2017

2017-11-16T20:00:00Z

Regulate IoT, Visualize CRISPR, Distract Strategically, and Code Together

  1. It's Time to Regulate IoT To Improve Security -- Bruce Schneier puts it nicely: internet security is now becoming "everything" security.
  2. Real-Space and Real-Time Dynamics of CRISPR-Cas9 (Nature) -- great visuals, written up for laypeople in The Atlantic. (via Hacker News)
  3. How the Chinese Government Fabricates Social Media Posts for Strategic Distraction, not Engaged Argument -- research paper. Application to American media left as exercise to the reader.
  4. Coding Together in Real Time with Teletype for Atom -- what it says on the box.

Continue reading Four short links: 16 November 2017.

(image)



The tools that make TensorFlow productive

2017-11-16T14:25:00Z

Analytical frameworks come with an entire ecosystem.Deployment is a big chunk of using any technology, and tools to make deployment easier have always been an area of innovation in computing. For instance, the difficulties and uncertainties of installing software and keeping it up-to-date were one factor driving companies to offer software as a service over the Web. Likewise, big data projects present their own set of issues: how do you prepare and ingest the data? How do you view the choices made by algorithms that are complex and dynamic? Can you use hardware acceleration (such as GPUs) to speed analytics, which may need to operate on streaming, real-time data? Those are just a few deployment questions associated with deep learning. In the report Considering TensorFlow for the Enterprise, authors Sean Murphy and Allen Leis cover the landscape of tools for working with TensorFlow, one of the most popular frameworks currently in big data analysis. They explain the importance of seeing deep learning as an integral part of a business environment—even while acknowledging that many of the techniques are still experimental—and review some useful auxiliary utilities. These exist for all of the major stages of data processing: preparation, model building, and inference (submitting requests to the model), as well as debugging. Given that the decisions made by deep learning algorithms are notoriously opaque (it's hard to determine exactly what combinations of features led to a particular classification), one intriguing part of the report addresses the possibility of using TensorBoard to visualize what's going on in the middle of a neural network. The UI offers you a visualization of the stages in the neural network, and you can see what each stage sends to the next. Thus, some of the mystery in deep learning gets stripped away, and you can explain to your clients some of the reasons that a particular result was reached. Another common bottleneck for many companies stems from the sizes of modern data sets, which often beg for help in getting ingested and through the system. One study found that about 20% of businesses handle data sets in the range of terabytes, with smaller ranges (gigabytes) being most common, and larger ones (petabytes) quite rare. For that 20% or more using unwieldy data sets, Murphy and Leis’s report is particularly valuable because special tools can help tie TensorFlow analytics to the systems that pass data through its analytics, such as Apache Spark. The authors also cover options for hardware acceleration: a lot of research has been done on specialized hardware that can accelerate deep learning even more than GPUs do. The essential reason for using artificial intelligence in business is to speed up predictions. To reap the most benefit from AI, therefore, one should find the most appropriate hardware and software combination to run the AI analytics. Furthermore, you want to reduce the time it takes to develop the analytics, which will allow you to react to changes in fast-moving businesses and reduce the burden on your data scientists. For many reasons, understanding the tools associated with TensorFlow makes its use more practical. This post is part of a collaboration between O'Reilly and TensorFlow. See our statement of editorial independence. Continue reading The tools that make TensorFlow productive.[...]



Implementing The Pipes and Filters Pattern using Actors in Akka for Java

2017-11-16T13:00:00Z

(image)

How messages help you decouple, test, and re-use your software’s code.

We would like to introduce a couple of interesting concepts from Akka by giving an overview of how to implement the pipes and filters enterprise integration pattern. This is a commonly used pattern that helps us flexibly compose together sequences of alterations to a message. In order to implement this pattern we use Akka - a popular library that provides new approaches to write modern reactive software in Java and Scala.

The Business problem

Recently we came across an author publishing application made available as a service. It was responsible for processing markdown text. It would execute a series of operations back to back:

Continue reading Implementing The Pipes and Filters Pattern using Actors in Akka for Java.

(image)



Nathaniel Schutta on succeeding as a software architect

2017-11-16T12:10:00Z

(image)

The O’Reilly Programming Podcast: The skills needed to make the move from developer to architect.

In this episode of the O’Reilly Programming Podcast, I talk with Nathaniel Schutta, a solutions architect at Pivotal, and presenter of the video I’m a Software Architect, Now What?. He will be giving a presentation titled Thinking Architecturally at the 2018 O’Reilly Software Architecture Conference, February 25-28, 2018, in New York City.

Continue reading Nathaniel Schutta on succeeding as a software architect.

(image)



Modern HTTP service virtualization with Hoverfly

2017-11-16T11:00:00Z

(image)

Service virtualization brings a lightweight, automatable means of simulating external dependencies.

In modern software systems, it’s very common for applications to depend on third party or internal services. For example, an ecommerce site might depend on a third party payment service to process card payments, or a social network to provide authentication. These sorts of applications can be challenging to test in isolation, as their dependencies can introduce problems like:

  • Non-determinism
  • Slow and costly builds
  • Unmockable client libraries
  • Rate-limiting
  • Expensive licensing costs
  • Incompleteness
  • Slow provisioning

To get around this, service virtualization, or replacing these components with a process which simulates them, can emulate these dependencies. Unlike mocking, which replaces your application code, service virtualization lives externally, typically operating at the network level. It is non-invasive, and is essentially just like the real thing from the perspective of its consumer.

Continue reading Modern HTTP service virtualization with Hoverfly.

(image)



Four short links: 15 November 2017

2017-11-15T11:00:00Z

Paywalled Research, Reproducing AI Research, Spy Teardown, and Peer-to-Peer Misinformation

  1. 65 of the 100 Most-Cited Papers Are Paywalled -- The weighted average of all the paywalls is: $32.33 [...] [T]he open access articles in this list are, on average, cited more than the paywalled ones.
  2. AI Reproducibility -- Participants have been tasked with reproducing papers submitted to the 2018 International Conference on Learning Representations, one of AI’s biggest gatherings. The papers are anonymously published months in advance of the conference. The publishing system allows for comments to be made on those submitted papers, so students and others can add their findings below each paper. [...] Proprietary data and information used by large technology companies in their research, but withheld from papers, is holding the field back.
  3. Inside a Low-Budget Consumer Hardware Espionage Implant -- The S8 data line locator is a GSM listening and location device hidden inside the plug of a standard USB data/charging cable. Has a microphone but no GPS, remotely triggered via SMS messages, uses data to report cell tower location to a dodgy server...and is hidden in a USB cable.
  4. She Warned of ‘Peer-to-Peer Misinformation.’ Congress Listened (NY Times) -- Renee's work on anti-vaccine groups (and her college thesis on propaganda in the 2004 Russian elections) led naturally to her becoming an expert on Russian propaganda in the 2016 elections.

Continue reading Four short links: 15 November 2017.

(image)



Scaling messaging in Go network clients

2017-11-15T11:00:00Z

(image)

Learn how the NATS client implements fast publishing and messages processing schemes viable for production use.

The previous article in this series created a client that communicated with a server in a simple fashion. This article shows how to add features that make the client more viable for production use. Problems we’ll solve include:

  1. Each message received from the server will block the read loop while executing the callback that handles the message, because the loop and callback run in the same goroutine. This also means that we cannot the implement Request() and Flush() methods that the NATS Go client offers.
  2. All publish commands are triggering a flush to the server and blocking when doing so, impacting performance.

We’ll fix these problems in this article. The third and last section of the article will build on the client we create here to build in Request/Response functionality for one-to-one communication. Other useful functionality that is not covered in this series, but that a production client should have, include:

Continue reading Scaling messaging in Go network clients.

(image)



5 tips for driving design thinking in a large organization

2017-11-14T12:00:00Z

(image)

How user-centered design focused on user needs and delivery can bring about real change and still be respected in the boardroom.

Continue reading 5 tips for driving design thinking in a large organization.

(image)



C++17 upgrades you should be using in your code

2017-11-14T11:00:00Z

(image)

Structured bindings, new library types, and containers add efficiency and readability to your code.

C++17 is a major release, with over 100 new features or significant changes. In terms of big new features, there's nothing as significant as the rvalue references we saw in C++11, but there are a lot of improvements and additions, such as structured bindings and new container types. What’s more, a lot has been done to make C++ more consistent and remove unhelpful and unnecessary behavior, such as support for trigraphs and std::auto_ptr.

This article discusses two significant C++17 upgrades that developers need to adopt when writing their own C++ code. I’ll explore structured bindings, which is a useful new way to work with structured types, and then some of the new types and containers that have been added to the Standard Library.

Continue reading C++17 upgrades you should be using in your code.

(image)



Four short links: 14 November 2017

2017-11-14T11:00:00Z

AI Microscope, Android Geriatrics, Doxing Research, and Anti-Goals

  1. AI-Powered Microscope Counts Malaria Parasites in Blood Samples (IEEE Spectrum) -- The EasyScan GO microscope under development would combine bright-field microscope technology with a laptop computer running deep learning software that can automatically identify parasites that cause malaria. Human lab workers would mostly focus on preparing the slides of blood samples to view under the microscope and verifying the results. Currently 20m/slide (same as a human), but they want to cut it to 10m/slide.
  2. A Billion Outdated Android Devices in Use -- never ask why security researchers drink more than the rest of society.
  3. Datasette (Simon Willison) -- instantly create and publish an API for your SQLite databases.
  4. Fifteen Minutes of Unwanted Fame: Detecting and Characterizing Doxing -- This work analyzes over 1.7 million text files posted to pastebin.com, 4chan.org, and 8ch.net, sites frequently used to share doxes online, over a combined period of approximately 13 weeks. Notable findings in this work include that approximately 0.3% of shared files are doxes, that online social networking accounts mentioned in these dox files are more likely to close than typical accounts, that justice and revenge are the most often cited motivations for doxing, and that dox files target males more frequently than females.
  5. The Power of Anti-Goals (Andrew Wilkinson) -- instead of exhausting aspirations, focus on avoiding the things that deplete your life. (via Daniel Bachhuber)

Continue reading Four short links: 14 November 2017.

(image)



Four short links: 13 November 2017

2017-11-13T12:30:00Z

Software 2.0, Watson Walkback, Robot Fish, and Smartphone Data

  1. Software 2.0 (Andrej Karpathy) -- A large nimber of programmers of tomorrow do not maintain complex software repositories, write intricate programs, or analyze their running times. They collect, clean, manipulate, label, analyze, and visualize data that feeds neural networks. Supported by Pete Warden: I know this will all sound like more deep learning hype, and if I wasn’t in the position of seeing the process happening every day, I’d find it hard to swallow too, but this is real. Bill Gates is supposed to have said "Most people overestimate what they can do in one year and underestimate what they can do in 10 years," and this is how I feel about the replacement of traditional software with deep learning. There will be a long ramp-up as knowledge diffuses through the developer community, but in 10 years, I predict most software jobs won’t involve programming. As Andrej memorably puts it, “[deep learning] is better than you”!
  2. IBM Watson Not Even Close -- The interviews suggest that IBM, in its rush to bolster flagging revenue, unleashed a product without fully assessing the challenges of deploying it in hospitals globally. While it has emphatically marketed Watson for cancer care, IBM hasn’t published any scientific papers demonstrating how the technology affects physicians and patients. As a result, its flaws are getting exposed on the front lines of care by doctors and researchers who say that the system, while promising in some respects, remains undeveloped. AI has been drastically overhyped, and there will be more disappointments to come.
  3. Robot Spy Fish -- “The fish accepted the robot into their schools without any problem,” says Bonnet. “And the robot was also able to mimic the fish’s behavior, prompting them to change direction or swim from one room to another.”
  4. Politics Gets Personal: Effects of Political Partisanship and Advertising on Family Ties -- Using smartphone-tracking data and precinct-level voting, we show that politically divided families shortened Thanksgiving dinners by 20-30 minutes following the divisive 2016 election.[...] we estimate 27 million person-hours of cross-partisan Thanksgiving discourse were lost in 2016 to ad-fueled partisan effects Smartphone data is useful data. (via Marginal Revolution)

Continue reading Four short links: 13 November 2017.

(image)



“Not hotdog” vs. mission-critical AI applications for the enterprise

2017-11-13T12:00:00Z

Drawing parallels and distinctions around neural networks, data sets, and hardware.Artificial intelligence has come a long way since the concept was introduced in the 1950s. Until recently, the technology had an aura of intrigue, and many believed its place was strictly inside research labs and science fiction novels. Today, however, the technology has become very approachable. The popular TV show Silicon Valley recently featured an app called “Not Hotdog,” based on cutting-edge machine learning frameworks, showcasing how easy it is to create a deep learning application. Gartner has named applied AI and machine-learning-powered intelligent applications as the top strategic technology trend for 2017, and reports that by 2020, 20% of companies will dedicate resources to AI. CIOs are under serious pressure to commit resources to AI and machine learning. It is becoming easier to build an AI app like Not Hotdog for fun and experimentation, but what does it take to build a mission-critical AI application that a CIO can trust to help run a business? Let’s take a look. For the purpose of this discussion, we will limit our focus to applications similar to Not Hotdog, (i.e., applications based on image recognition and classification), although the concepts can be applied to a wide variety of deep learning applications. We will also limit the discussion to systems and frameworks, because personnel requirements can vary significantly based on the application. For example, for an image classification application built for retinal image classification, Google required the assistance of 54 ophthalmologists. Whereas for an application built for recognizing dangerous driving, we are going to require significantly less expertise and fewer people. Image classification: Widely applicable deep learning use case At its core, Not Hotdog is an image classification application. It classifies images into two categories: “hotdogs” and “not hotdogs.” Figure 1. Screenshot from the “Not Hotdog” app courtesy of Ankur Desai. Image classification has many applications across industries. In health care, it can be used for medical imaging and diagnosing diseases. In retail, it can be used to spot malicious activities in stores. In agriculture, it can be used to determine the health of crops. In consumer electronics, it can provide face recognition and autofocus to camera-enabled devices. In the public sector, it can be used to identify dangerous driving with traffic cameras. The list goes on. The fundamental difference between these applications and Not Hotdog is the core purpose of the application. Not Hotdog is intentionally meant to be farcical. As a result, it is an experimental app. However, the applications listed above are meant to be critical to core business processes. Let’s take a look at how “Not Hotdog” is built, and then we will discuss additional requirements for mission-critical deep learning applications. Not Hotdog: How is it built? This blog takes us through the wonderful journey of Not Hotdog’s development process. Following is the summary of how it is built. Not Hotdog uses the following key software components: React Native: An app development framework that makes it easy to build mobile apps. TensorFlow: An open source software library for machine learning. It makes building deep learning neural networks easy with pre-built libraries. Keras: An open source neural network library written in Python. It is capable of running on top of TensorFlow and other machine learning libraries. It presents higher-level abstractions that make it easy to configure neural networks. The following deep neural networks were considered[...]



Four short links: 10 November 2017

2017-11-10T11:45:00Z

Syntactic Sugar, Surprise Camera, AI Models, and Git Recovery

  1. Ten Features From Modern Programming Languages -- interesting collection of different flavors of syntactic sugar.
  2. Access Both iPhone Cameras Any Time Your App is Running -- Once you grant an app access to your camera, it can: access both the front and the back camera; record you at any time the app is in the foreground; take pictures and videos without telling you; upload the pictures/videos it takes immediately; run real-time face recognition to detect facial features or expressions.
  3. Deep Learning Models with Demos -- portable and searchable compilation of pre-trained deep learning models. With demos and code. Pre-trained models are deep learning model weights that you can download and use without training. Note that computation is not done in the browser.
  4. Git flight rules -- Flight rules are the hard-earned body of knowledge recorded in manuals that list, step-by-step, what to do if X occurs, and why. Essentially, they are extremely detailed, scenario-specific standard operating procedures. [...]

Continue reading Four short links: 10 November 2017.

(image)



Building a natural language processing library for Apache Spark

2017-11-09T15:40:00Z

(image)

The O’Reilly Data Show Podcast: David Talby on a new NLP library for Spark, and why model development starts after a model gets deployed to production.

When I first discovered and started using Apache Spark, a majority of the use cases I used it for involved unstructured text. The absence of libraries meant rolling my own NLP utilities, and, in many cases, implementing a machine learning library (this was pre deep learning, and MLlib was much smaller). I’d always wondered why no one bothered to create an NLP library for Spark when many people were using Spark to process large amounts of text. The recent, early success of BigDL confirms that users like the option of having native libraries.

In this episode of the Data Show, I spoke with David Talby of Pacific.AI, a consulting company that specializes in data science, analytics, and big data. A couple of years ago I mentioned the need for an NLP library within Spark to Talby; he not only agreed, he rounded up collaborators to build such a library. They eventually carved out time to build the newly released Spark NLP library. Judging by the reception received by BigDL and the number of Spark users faced with large-scale text processing tasks, I suspect Spark NLP will be a standard tool among Spark users.

Talby and I also discussed his work helping companies build, deploy, and monitor machine learning models. Tools and best practices for model development and deployment are just beginning to emerge—I summarized some of them in a recent post, and, in this episode, I discussed these topics with a leading practitioner.

Continue reading Building a natural language processing library for Apache Spark.

(image)



Four short links: 9 November 2017

2017-11-09T11:00:00Z

Culture, Identifying Bots, Attention Economy, and Machine Bias

  1. Culture is the Behaviour You Reward and Punish -- When all the “successful” people behave in the same way, culture is made.
  2. Identifying Viral Bots and Cyborgs in Social Media -- it is readily possible to identify social bots and cyborgs on both Twitter and Facebook using information entropy and then to find groups of successful bots using network analysis and community detection.
  3. An Economy Based on Attention is Easily Gamed (The Economist) -- Americans touch their smartphones on average more than 2,600 times a day (the heaviest users easily double that). The population of America farts about 3m times a minute. It likes things on Facebook about 4m times a minute.
  4. Frankenstein's Legacy: Four conversations about Artificial Intelligence, Machine Learning, and the Modern World (CMU) -- A machine isn’t a human. It’s not going to necessarily incorporate bias even from biased training data in the same way that a human would. Machine learning isn’t necessarily going to adopt—for lack of a better word—a clearly racist bias. It’s likely to have some kind of much more nuanced bias that is far more difficult to predict. It may, say, come up with very specific instances of people it doesn’t want to hire that may not even be related to human bias.

Continue reading Four short links: 9 November 2017.

(image)



The phone book is on fire

2017-11-09T00:55:00Z

(image)

Lessons from the Dyn DNS DDoS.

Continue reading The phone book is on fire.

(image)



Guidelines for how to design for emotions

2017-11-08T12:05:00Z

(image)

Learn what makes for a rich emotional experience and why, even if we make our technology invisible, the connection will still be emotional.

Continue reading Guidelines for how to design for emotions.

(image)



Identifying viral bots and cyborgs in social media

2017-11-08T12:00:00Z

Analyzing tweets and posts around Trump, Russia, and the NFL using information entropy, network analysis, and community detection algorithms. Particularly over the last several years, researchers across a spectrum of scientific disciplines have studied the dynamics of social media networks to understand how information propagates as the networks evolve. Social media platforms like Twitter and Facebook include not only actual human users but also bots, or automated programs, that can significantly alter how certain messages are spread. While some information-gathering bots are beneficial or at least benign, it was made clear by the 2016 U.S. Presidential election and the 2017 elections in France that bots and sock puppet accounts (that is, numerous social accounts controlled by a single person) were effective in influencing political messaging and propagating misinformation on Twitter and Facebook. It is thus crucial to identify and classify social bots to combat the spread of misinformation and especially the propaganda of enemy states and violent extremist groups. This article is a brief summary of my recent bot detection research. It describes the techniques I applied and the results of identifying battling groups of viral bots and cyborgs that seek to sway opinions online. For this research, I have applied techniques from complexity theory, especially information entropy, as well as network graph analysis and community detection algorithms to identify clusters of viral bots and cyborgs (human users who use software to automate and amplify their social posts) that differ from typical human users on Twitter and Facebook. I briefly explain these approaches below, so deep prior knowledge of these areas is not necessary. In addition to commercial bots focused on promoting click traffic, I discovered competing armies of pro-Trump and anti-Trump political bots and cyborgs. During August 2017, I found that anti-Trump bots were more successful than pro-Trump bots in spreading their messages. In contrast, during the NFL protest debates in September 2017, anti-NFL (and pro-Trump) bots and cyborgs achieved greater successes and virality than pro-NFL bots. Obtaining Twitter source data The data sets for my Twitter bot detection research consisted of ~60M tweets that mentioned the terms “Trump,” “Russia,” “FBI,” or “Comey”; the tweets were collected via the free Twitter public API in separate periods between May 2017 and September 2017. I have made the source tweet IDs as well as many of our analysis results files available in a data project published at data.world. Researchers who wish to collaborate on this project at data.world should send a request email to datapartners@paragonscience.com. Detecting bots using information entropy Information entropy is defined as the “the average amount of information produced by a probabilistic stochastic source of data.” As such, it is one effective way to quantify the amount of randomness within a data set. Because one can reasonably conjecture that actual humans are more complicated than automated programs, entropy can be a useful signal when one is attempting to identify bots, as has been done by a number of previous researchers. Of the recent research in social bot detection, particularly notable is the excellent work by groups of researchers from the University of California and Indiana University. Their “botornot” system uses a random forest machine learning model that incorporates 1,150 features derived from user account metadata, friend/follower data, network characterist[...]



Consumer-driven innovation for continuous glucose monitoring in diabetes patients

2017-11-08T12:00:00Z

CGMs are unique in the way consumers have taken it upon themselves to create modifications to medical devices.Imagine if your life suddenly depended on monitoring your body’s reaction every time you had a snack, skipped a meal, or ate a piece of candy. This is a reality for approximately 1.25 million people in the USA who have been diagnosed with Type 1 Diabetes (T1D). People with T1D experience unhealthy fluctuations in blood glucose levels due to the destruction of beta cells in the pancreas by the person’s own immune system. Beta cells produce insulin, which is a hormone that allows your body to break down, use, or store glucose, while maintaining a healthy blood sugar level throughout the day. Presently, there is no cure for T1D, so patients must be constantly vigilant about maintaining their blood glucose levels within a healthy range in order to avoid potentially deadly consequences. Currently, continuous glucose monitors (CGMs) are the most effective way to manage T1D. However, consumers have already become frustrated with the limitations of commercially available CGMs, and are developing at-home modifications to overcome them. This in turn, is influencing the direction of research and development in the biomedical devices industry, as multiple companies compete to create a CGM that appeals to the largest consumer population. Thus, consumer-driven innovation in CGM data access, CGM-insulin pump integration, and glucose sensor lifespan has led to rapid growth in the field of diabetes management devices. Coping with the highs and lows Patients with T1D need to monitor their blood glucose levels to ensure they don’t become hyperglycemic(high blood glucose levels), or hypoglycemic (low blood glucose levels), both of which can cause life-threatening complications. Throughout the late 1980s and 1990s, home glucose blood monitoring devices were the most accurate way to measure blood glucose levels. These devices use a lancet to prick the person’s finger to obtain real-time glucose levels from a drop of blood. Although still used today by some diabetics as a primary means of T1D management, finger prick devices have considerable drawbacks. These include the physical pain that comes from frequent finger pricks, the static nature of the glucose reading, and the indiscretion and inconvenience of taking multiple readings throughout the day and night. It is no wonder then that the market potential for a device that conveniently and accurately measures blood glucose levels continues to soar. The continuous glucose monitor (CGM) At the turn of the 21st century, the integration of technology and medicine introduced a novel way for patients to gain control of T1D. In 1999, MiniMed obtained approval from the U.S. Food and Drug Administration (FDA) for the first continuous glucose monitor (CGM). The device was implanted by a physician and recorded the patient’s glucose levels for three days. The patient then returned to the clinic to have the sensor removed and discuss any trends revealed by the CGM. In 2001, MiniMed was acquired by Medtronic, a medical device company that specializes in making diabetes management devices. In 2003, Medtronic received FDA approval to launch the first, real-time, patient-use CGM device. This kick-started an ongoing competition to create more accurate, user-friendly CGMs from other diabetes management medical device companies, such as Dexcom Inc., Senseonics, and Abbott Laboratories. Today, CGM devices consist of a thin, wire-like sensor that is inserted under the skin, [...]



Building messaging in Go network clients

2017-11-08T12:00:00Z

(image)

Learn useful Go communication techniques from the internals of the NATS client.

In today's world of data-intensive applications and frameworks tailored to tasks as varied as service discovery and stream processing, distributed systems are becoming ubiquitous. A reliable and high-performing networking client is essential to accessing and scaling such systems. Often, implementing a network client for a custom protocol can seem like a daunting task. If a protocol is too complex (as custom protocols have a tendency to become), maintaining and implementing the client can be a burden. Moreover, it is ideal to have good language support for doing asynchronous programming when implementing anything at a higher level than working with sockets. Handling all this customization and multi-language support can be greatly simplified by picking the right approach from the outset.

Fortunately, the Go programming language facilitates the task of developing networking clients by offering a robust standard library, excellent concurrency built-ins, and great tooling for doing benchmarks and static analysis, as well as having great performance and overall being a very flexible language for systems programming.

Continue reading Building messaging in Go network clients.

(image)



Susan Sons on building security from first principles

2017-11-08T11:55:00Z

(image)

The O’Reilly Security Podcast: Recruiting and building future open source maintainers, how speed and security aren’t mutually exclusive, and identifying and defining first principles for security.

In this episode of the Security Podcast, O’Reilly’s Mac Slocum talks with Susan Sons, senior systems analyst for the Center for Applied Cybersecurity Research (CACR) at Indiana University. They discuss how she initially got involved with fixing the open source Network Time Protocol (NTP) project, recruiting and training new people to help maintain open source projects like NTP, and how security needn’t be an impediment to organizations moving quickly.

Continue reading Susan Sons on building security from first principles.

(image)



Four short links: 8 November 2017

2017-11-08T10:55:00Z

Shadow Profiles, Theories of Learning, Feature Visualization, and Time to Reflect Reality

  1. How Facebook Figures Out Everyone You've Ever Met (Gizmodo) -- Behind the Facebook profile you’ve built for yourself is another one, a shadow profile, built from the inboxes and smartphones of other Facebook users. Contact information you’ve never given the network gets associated with your account, making it easier for Facebook to more completely map your social connections. (via Slashdot)
  2. Theories of Deep Learning (STATS 385) -- Stanford class. Lecture videos are posted after the lectures are given.
  3. Feature Visualization (Distill) -- How neural networks build up their understanding of images. Wonderfully visual.
  4. Mapping's Intelligent Agents -- Industry players are developing dynamic HD maps, accurate within inches, that would afford the car’s sensors some geographic foresight, allowing it to calculate its precise position relative to fixed landmarks. [...] Yet, achieving real-time “truth” throughout the network requires overcoming limitations in data infrastructure. The rate of data collection, processing, transmission, and actuation is limited by cellular bandwidth as well as on-board computing power. Mobileye is attempting to speed things up by compressing new map information into a “Road Segment Data” capsule that can be pushed between the master map in the Cloud and cars in the field. If nothing else, the system has given us a memorable new term, “Time to Reflect Reality,” which is the metric of lag time between the world as it is and the world as it is known to machines.

Continue reading Four short links: 8 November 2017.

(image)



Automated root cause analysis for Spark application failures

2017-11-07T12:00:00Z

Reduce troubleshooting time from days to seconds.Spark’s simple programming constructs and powerful execution engine have brought a diverse set of users to its platform. Many new big data applications are being built with Spark in fields like health care, genomics, financial services, self-driving technology, government, and media. Things are not so rosy, however, when a Spark application fails. Similar to applications in other distributed systems that have a large number of independent and interacting components, a failed Spark application throws up a large set of raw logs. These logs typically contain thousands of messages, including errors and stacktraces. Hunting for the root cause of an application failure from these messy, raw, and distributed logs is hard for Spark experts—and a nightmare for the thousands of new users coming to the Spark platform. We aim to radically simplify root cause detection of any Spark application failure by automatically providing insights to Spark users like what is shown in Figure 1. Figure 1. Insights from automatic root cause analysis improve Spark user productivity. Source: Adrian Popescu and Shivnath Babu. Spark platform providers like Amazon, Azure, Databricks, and Google clouds as well as application performance management (APM) solution providers like Unravel have access to a large and growing data set of logs from millions of Spark application failures. This data set is a gold mine for applying state-of-the-art artificial intelligence (AI) and machine learning (ML) techniques. In this blog, we look at how to automate the process of failure diagnosis by building predictive models that continuously learn from logs of past application failures for which the respective root causes have been identified. These models can then automatically predict the root cause when an application fails[1]. Such actionable root-cause identification improves the productivity of Spark users significantly. Clues in the logs A number of logs are available every time a Spark application fails. A distributed Spark application consists of a driver container and one or more executor containers. The logs generated by these containers have information about the application as well as how the application interacts with the rest of the Spark platform. These logs form the key data set that Spark users scan for clues to understand why an application failed. However, the logs are extremely verbose and messy. They contain multiple types of messages, such as informational messages from every component of Spark, error messages in many different formats, stacktraces from code running on the Java Virtual Machine (JVM), and more. The complexity of Spark usage and internals make things worse. Types of failures and error messages differ across Spark SQL, Spark Streaming, iterative machine learning and graph applications, and interactive applications from Spark shell and notebooks (e.g., Jupyter, Zeppelin). Furthermore, failures in distributed systems routinely propagate from one component to another. Such propagation can cause a flood of error messages in the log and obscure the root cause. Figure 2 shows our overall solution to deal with these problems and to automate root cause analysis (RCA) for Spark application failures. Overall, the solution consists of: Continuously collecting logs from a variety of Spark application failures Converting logs i[...]



Implementing continuous delivery

2017-11-07T11:00:00Z

(image)

The architectural design, automated quality assurance, and deployment skills needed for delivering continuous software.

There is an ever-increasing range of best practices emerging around microservices, DevOps, and the cloud, with some offering seemingly contradictory guidelines. There is one thing that developers can agree on: continuous delivery adds enormous value to the software delivery lifecycle through fast feedback and the automation of both quality assurance and deployment processes. However, the challenges for modern software developers are many, and attempting to introduce a methodology like continuous delivery—which touches all aspect of software design and delivery—means several new skills typically outside of a developer’s comfort zone must be mastered.

These are the key developer skills I believe are needed to harness the benefits of continuous delivery:

Continue reading Implementing continuous delivery.

(image)



Four short links: 7 November 2017

2017-11-07T10:45:00Z

Disturbing YouTube, Sketchy Presentation Tool, Yammer UI, and Dance Your Ph.D. Winners

  1. Something is Wrong on the Internet (James Bridle) -- This is a deeply dark time, in which the structures we have built to sustain ourselves are being used against us — all of us — in systematic and automated ways. It is hard to keep faith with the network when it produces horrors such as these. While it is tempting to dismiss the wilder examples as trolling, of which a significant number certainly are, that fails to account for the sheer volume of content weighted in a particularly grotesque direction. This is another reason why propping your kids in front of YouTube is unsafe and unwise.
  2. ChalkTalk -- a digital presentation and communication language in development at New York University's Future Reality Lab. Using a blackboard-like interface, it allows a presenter to create and interact with animated digital sketches in order to demonstrate ideas and concepts in the context of a live presentation or conversation.
  3. YamUI -- Microsoft open-sourced the reusable component framework that they built for Yammer. [B]uilt with React on top of Office UI Fabric components.
  4. Dance Your Ph.D. Finalists -- look at the finalists on this site, read about the winners on Smithsonian.

Continue reading Four short links: 7 November 2017.

(image)



Developing successful AI apps for the enterprise

2017-11-06T16:55:00Z

The IBM team encourages developers to ask tough questions, be patient, and be ready to fail gracefully.In this episode of the O’Reilly Media Podcast, I sat down with Josh Zheng and Tom Markiewicz, developer advocates for IBM Watson. We discussed how natural language processing (NLP) APIs, and chatbots in particular, represent just one of the ways AI is augmenting humans and boosting productivity in enterprises today. In order to apply AI to the enterprise, Zheng and Markiewicz explain, developers first need to understand the importance of sourcing and cleaning the organization’s data, much of which is coming in unstructured formats like email, customer support chats, and PDF documents. This can be “unglamorous” work, but it’s also critical to building a successful NLP app, or chatbot. From there, Zheng and Markiewicz offer some practical tips for developers looking to build chatbots: to have context awareness, to fail gracefully, and to have patience—building a successful chatbot can take time. Below are some highlights from the discussion: The hype behind chatbots Josh Zheng: I think one of the biggest propellers for [chatbots] now is the increase in availability of the NLP capabilities. So, a chatbot uses a couple NLP techniques to make the whole thing work, but these things are actually not very new. They've been around for a while. I think what's different is that they've always been kind of locked up in research labs. There have been open source tools like Python's NLTK that made them more accessible, but it's not until recently, where companies like IBM and Google have put APIs on the cloud and made them very user-friendly and easily accessible, that large enterprises—which are usually more behind on the adoption curve—are able to access them and use them. Use cases for chatbots in tech and travel Josh Zheng: Autodesk built a customer support virtual agent on top of IBM Watson. This need came when they first moved from a client-per-software model into more of a SaaS model. They really widened their customer reach, but with that came a lot more customer increase and the need for customer support. ... They were able to build a chatbot [Autodesk Virtual Agent] that is able to answer a lot of the questions. And it turns out, a lot of the questions people have are very similar. ... A lot of these are very simple questions that a machine can take over and let the humans focus on the complex questions or the complex requests. ... They were able to reduce the average time-to-resolution by a huge margin. After implementing the chatbot, we see that on average it takes 1.5 days to resolve questions involving humans and only 5.6 minutes to resolve chatbot-only questions. Developer-first mentality: Prototyping your way to successful AI apps Tom Markiewicz: You can try all of the APIs for free and just build little prototypes to see if that fits into what you're trying to do before planning a giant budget and going through the process. That's the beauty of the shift over the last couple of years, with more of a developer-first kind of mentality—the understanding is no longer, ‘Okay, we're going to start a project and we're going to set a big budget, and then we're going to push it from the top down.’ Now, developers are really enabled with the advent of the API economy to go o[...]



Four short links: 6 November 2017

2017-11-06T11:00:00Z

IoT Standard, Probabilistic Programming, Go Scripting, and Front-End Checklist

  1. A Firmware Update Architecture for Internet of Things Devices -- draft submitted to IETF. It has a long way to go before it's a standard, but gosh it'd be nice to have this stuff without everyone reinventing it from scratch. (via Bleeping Computer)
  2. Pyro -- a universal probabilistic programming language (PPL) written in Python and supported by PyTorch on the back end. Pyro enables flexible and expressive deep probabilistic modeling, unifying the best of modern deep learning and Bayesian modeling.
  3. Neugram -- scripting language integrated with Go. Overview of the language.
  4. Front-End Checklist -- an exhaustive list of all elements you need to have / to test before launching your site / HTML page to production. (website)

Continue reading Four short links: 6 November 2017.

(image)



Four short links: 3 November 2017

2017-11-03T10:10:00Z

End of Startups, Company Strategy, Complex Futures, and Bitcoin Energy

  1. Ask Not For Whom The Deadpool Tolls -- We live in a new world now, and it favors the big, not the small. The pendulum has already begun to swing back. Big businesses and executives, rather than startups and entrepreneurs, will own the next decade; today’s graduates are much more likely to work for Mark Zuckerberg than follow in his footsteps.
  2. Notes on Developing a Strategy and Designing a Company -- These notes provide a sequence of steps for creating or evaluating a strategy and associated company design, drawing clear lines to quantitative and evidence-based evaluation of enterprise performance and to financial valuation. The notes are intended for practical use by managers or instructors of MBAs and executive MBAs.
  3. Designing Our Complex Future with Machines (Joi Ito) -- We should learn from our history of applying over-reductionist science to society and try to, as Wiener says, “cease to kiss the whip that lashes us.” While it is one of the key drivers of science—to elegantly explain the complex and reduce confusion to understanding—we must also remember what Albert Einstein said: “Everything should be made as simple as possible, but no simpler.” We need to embrace the unknowability—the irreducibility—of the real world that artists, biologists, and those who work in the messy world of liberal arts and humanities are familiar with.
  4. Bitcoin Energy Consumption -- 7.51 U.S. households powered for a day by one transaction; $1B of energy used in a year to mine; Bitcoin has the same energy consumption as all of Nigeria. "Bitcoin" is how homo economicus pronounces "externality."

Continue reading Four short links: 3 November 2017.

(image)



Establishing the “why” of your product

2017-11-02T18:00:00Z

How vision, mission, and values help you craft a winning product roadmap. A product vision should be about having an impact on the lives of the people your product serves, as well as on your organization. It’s easy to get overwhelmed by the various concepts and terminology surrounding product development, and even more when you start to consider the terminology involved with strategy. There are mission statements, company visions, values, goals, strategy, problem statements, purpose statements, and success criteria. Further, there are acronyms like KPI and OKR, which also seem potentially useful in guiding your efforts. How do you know which ideas apply to your situation, and where to start? Whether your organization is mission-, vision-, or values-driven (or a combination thereof), these are all considered guiding principles to draw from and offer your team direction. For the purposes of this book, we’ll establish definitions for mission, vision, and values, so we have a common language. Bear with us if you have different definitions of them yourself. Mission defines your intent A mission is not what you value, nor is it a vision for the future; it’s the intent you hold right now and the purpose driving you to realize your vision. A well-written mission statement will clarify your business’s intentions. Most often we find mission statements contain a mix of realism and optimism, which are sometimes at odds with each other. There are four key elements to a well-crafted mission statement: Value What value does your mission bring to the world? Inspiration How does your mission inspire your team to make the vision a reality? Plausibility Is your mission realistic and achievable? If not, it’s disheartening, and people won’t be willing to work at it. If it seems achievable, however, people will work their tails off to make it happen. Specificity Is your mission specific to your business, industry, and/or sector? Make sure it’s relevant and resonates with the organization. Here are two example missions. Can you guess the company for either? Company A To refresh the world... To inspire moments of optimism and happiness... To create value and make a difference. Company B To inspire and nurture the human spirit—one person, one cup, and one neighborhood at a time. Company A is Coca-Cola, and Company B is Starbucks. While these missions may also be considered marketing slogans due to the size and popularity of each company, it’s important to note their aspirational context. Another aspect of mission that’s often overlooked is that it has to reflect what you do for someone else.That someone else is typically not your shareholders, but your customers. Vision statements are very often conflated with mission. We’ve seen many company vision statements that are actually mission statements. Vision statements are a challenge to not be self-centered to “be the best ___.” Vision is the outcome you seek A company vision should be about a longer-term outcome that has an impact on the lives of the people your product serves, as well as on your organization. Vision is why your organization exists, and it can be decomposed into the benefits you hope to create through your efforts—for both[...]



Matt Stine on cloud-native architecture

2017-11-02T17:30:00Z

(image)

The O’Reilly Programming Podcast: Applying architectural patterns and pattern languages to build systems for the cloud.

In this episode of the O’Reilly Programming Podcast, I talk with Matt Stine, global CTO of architecture at Pivotal. He is the presenter of the O’Reilly live online training course Cloud-Native Architecture Patterns, and he has spoken about cloud-native architecture at the recent O’Reilly Software Architecture Conference and O’Reilly Security Conference.

Continue reading Matt Stine on cloud-native architecture.

(image)



Deep convolutional generative adversarial networks with TensorFlow

2017-11-02T11:00:00Z

How to build and train a DCGAN to generate images of faces, using a Jupyter Notebook and TensorFlow.The concept of generative adversarial networks (GANs) was introduced less than four years ago by Ian Goodfellow. Goodfellow uses the metaphor of an art critic and an artist to describe the two models—discriminators and generators—that make up GANs. An art critic (the discriminator) looks at an image and tries to determine if its real or a forgery. An artist (the generator) who wants to fool the art critic tries to make a forged image that looks as realistic as possible. These two models “battle” each other; the discriminator uses the output of the generator as training data, and the generator gets feedback from the discriminator. Each model becomes stronger in the process. In this way, GANs are able to generate new complex data, based on some amount of known input data, in this case, images. It may sound scary to implement GANs, but it doesn’t have to be. In this tutorial, we will use TensorFlow to build a GAN that is able to generate images of human faces. Architecture of our DCGAN In this tutorial, we are not trying to mimic simple numerical data—we are trying to mimic an image, which should even be able to fool a human. The generator takes a randomly generated noise vector as input data and then uses a technique called deconvolution to transform the data into an image. The discriminator is a classical convolutional neural network, which classifies real and fake images. Figure 1. Simplified visualization of a GAN. Image source: “Generative Adversarial Networks for Beginners,” O’Reilly. We are going to use the original DCGAN architecture from the paper Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks, which consists of four convolutional layers for the discriminator and four deconvolutional layers for the generator. Setup Please access the code and Jupyter Notebook for this tutorial on GitHub. All instructions are in the README file in the GitHub repository. A helper function will automatically download the CelebA data set to get you up and running quickly. Be sure to have matplotlib installed to actually see the images and requests to download the data set. If you don’t want to install it yourself, there is a Docker image included in the repository. The CelebA data set The CelebFaces Attributes data set contains more than 200,000 celebrity images, each with 40 attribute annotations. Since we just want to generate images of random faces, we are going to ignore the annotations. The data set includes more than 10,000 different identities, which is perfect for our cause. Figure 2. Some examples of the CelebA data set. Image courtesy of Dominic Monn. At this point, we are also going to define a function for batch generation. This function will load our images and give us an array of images according to a batch size we are going to set later. To get better results, we will crop the images, so that only the faces are showing. We will also normalize the images so that their pixel values are in a range from -0.5 to +0.5. Last, we are going to downscale the images to 28x28 after tha[...]



Becoming an accidental architect

2017-11-02T10:00:00Z

(image)

How software architects can balance technical proficiencies with an appropriate mastery of communication.

One of the demographics Brian and I noticed in the several O'Reilly Software Architecture Conferences we've hosted is the Accidental Architect: someone who makes architecture-level decisions on projects without a formal Architect title. Over time, we're building more material into the conference program to accommodate this common role.

But how does one transition from developer to Accidental Architect? It doesn't happen overnight.

Continue reading Becoming an accidental architect.

(image)



Four short links: 2 November 2017

2017-11-02T09:55:00Z

Capsule Neural Networks, Adversarial Objects, Deep Learning Language, and Crowdsourced Pop Star

  1. Dynamic Routing Between Capsules -- new paper from one of the deep learning luminaries, Geoff Hinton. Hacker Noon explains: In this paper the authors project that human brains have modules called “capsules.” These capsules are particularly good at handling different types of visual stimulus and encoding things like pose (position, size, orientation), deformation, velocity, albedo, hue, texture, etc. The brain must have a mechanism for “routing” low-level visual information to what it believes is the best capsule for handling it.
  2. Adversarial Objects -- Here is a 3D-printed turtle that is classified at every viewpoint as a “rifle” by Google’s InceptionV3 image classifier, whereas the unperturbed turtle is consistently classified as “turtle.”
  3. DeepNLP 2017 -- Oxford University applied course focussing on recent advances in analyzing and generating speech and text using recurrent neural networks.
  4. Virtual Singer Becomes Japanese Mega-Star (Bloomberg) -- CG-rendered pop star, singing crowdsourced songs. Crucial to Miku’s success is the ability for devotees to purchase the Yamaha-powered Vocaloid software and write their own songs for the star to sing right back at them. Fans then can upload songs to the web and vie for the honor of having her perform them at “live” gigs, in which the computer-animated Miku takes center stage, surrounded by human guitarists, drummers and pianists. This is fantastic. (via Slashdot)

Continue reading Four short links: 2 November 2017.

(image)



2017 O'Reilly Defender Awards

2017-11-01T20:00:00Z

(image)

The O’Reilly Defender Awards celebrate those who have demonstrated exceptional leadership, creativity, and collaboration in the defensive security field.

Continue reading 2017 O'Reilly Defender Awards.

(image)



Building a culture of security at the New York Times

2017-11-01T20:00:00Z

(image)

Runa Sandvik shares practical lessons on how to build and foster a culture of security across an organization.

Continue reading Building a culture of security at the New York Times.

(image)



An infinite set of security tools

2017-11-01T20:00:00Z

(image)

Window Snyder says security basics are hard to implement consistently, but they're worth the effort.

Continue reading An infinite set of security tools.

(image)



Developing a successful data governance strategy

2017-11-01T11:30:00Z

Multi-model database architectures provide a flexible data governance platformData governance has become increasingly critical as more organizations rely on data to make better decisions, optimize operations, create new products and services, and improve profitability. Upcoming data security regulations like the new EU GPDR law will require organizations to have a forward-looking approach in order to comply with these requirements. Additionally, regulated industries, such as health care and finance, spend a tremendous amount of money on compliance with regulations that are constantly changing. Developing a successful data governance strategy requires careful planning, the right people, and the appropriate tools and technologies. It is necessary to implement the required policies and procedures across all of an organization’s data in order to guarantee that everyone acts in accordance with the regulatory framework. Implementing a modern data governance framework requires the use of new technologies. Traditional technologies, known as Relational Database Management Systems (RDBMS), are based on the relational model, in which data are presented and stored in a tabular form. RDBMS are not flexible enough to easily update this relational schema when data need to change frequently. Basically, RDBMS are not good in the data governance context, because you need to define your model in advance. However, NoSQL technologies, such as document-oriented databases, provide a way to store and retrieve data that can be modeled in a non-tabular form, and they do not require having a model in advance. A flexible data governance framework prevents situations in which complex engineering systems have several disconnected pieces of data along with expensive hardware. A flexible data governance framework is able to ingest data without needing a long extract-transform-load (ETL) process, and it should support schema-free databases storing data from multiple disparate data sources. An important characteristic of flexible data governance frameworks is the ability to support semantic relationships using Graphs, RDF triples or Ontologies, as shown in the following figure. Figure 1. Example of a multi-model database system that stores all the entities as documents and the relationships as triples. Image source: MarkLogic. The multi-model database is a general NoSQL approach that can store, index, and query data in one or more of the previous models. Due to this flexibility, the multi-model database is the best approach for addressing data governance. (For more information about multi-model databases see the O’Reilly ebook: Building on Multi-Model Databases. With a multi-model database, all relevant data is stored as a document, and all information about source, date created, priveleges, etc., are stored as metadata in an envelope around the document. This original data is preserved—and the metadata is included. Semantics enrich this capability by enabling inferences and provide a great way to model “policies” because they are terr[...]



FDA regulation defines business strategy in direct-to-consumer genetic testing

2017-11-01T11:00:00Z

The FDA is entering a new era of regulation as whole genome sequencing becomes more accessible to consumers.Why do consumers seek direct-to-consumer (DTC) genetic testing? Consumers purchase services that sequence and analyze portions of their DNA to understand their risk for familial cancer, plan a safer pregnancy, optimize diet and fitness routines, and satisfy their curiosity about the secrets of their genome and ancestry. The diversifying reasons for consumer interest in DTC genetic testing are estimated to increase its global market value to $350 million by 2022. With such a valuable market at stake, regulation of DTC genetic testing by the U.S. Food and Drug Administration (FDA) has been under intense surveillance by the biotech industry, health care providers, and consumers alike. The FDA has been regulating medical devices since 1976 when Congress passed the Medical Device Amendments to the Federal Food, Drug, and Cosmetic Act. A medical device is defined as anything that can be used to diagnose, cure, treat, mitigate, or prevent disease, including an instrument, reagent, or “similar or related article.” In vitro genetic tests are therefore considered medical devices. The FDA regulates both genetic tests that are ordered and performed at home (DTC) and those that are ordered and performed in a health care setting or laboratory (a laboratory-developed test, or LDT). These two types of tests require different levels of FDA regulation. LDTs are ordered by a physician, developed by and performed in a single laboratory, are not sold to other laboratories, and are not marketed to consumers. In theory, this reduces the risk of misunderstanding the results and the possibility of erroneous health-related decision-making by the consumer. On the other hand, DTC tests must pass a higher regulatory bar and demonstrate that they clearly and safely relay information to consumers in the absence of a medical professional. DTC tests do not provide an “informed intermediary” such as a physician or trained expert to explain results, reduce stress, and discuss follow-up options, while physician-delivered reports from LDTs do. Before the FDA began regulating DTC tests, consumers were purchasing these tests to learn about their risk for Parkinson’s disease, how they might respond to certain types of drugs, if they were likely to develop Alzheimer’s disease, their ancestry, and more. Many of these results were diagnostic in nature, which prompted the FDA to intervene. FDA crackdown on DTC testing The FDA watchfully waited as DTC genetic testing companies developed products. The FDA assessed the potential risks and impacts of the products on the consumer, and did not regulate the conduct of DTC genetic testing companies until the companies brought products to market that could be classified as medical devices. In May 2010, the FDA notified Pathway Genomics that their product was a medical device, and therefore needed FDA approval. In June 2010, the FDA followed by sending warnings to four add[...]



Four short links: 1 November 2017

2017-11-01T10:55:00Z

Crypto Docs, Ultrasound, Anti-Innovation Investors, and IoT Security

  1. Airborn OS -- attempt to do an open source Google Docs with crypto.
  2. ButterflyIQ -- ultrasound on a chip. IEEE covers it: announced FDA clearance for 13 clinical applications, including cardiac scans, fetal and obstetric exams, and musculoskeletal checks. Rather than using a dedicated piece of hardware for the controls and image display, the iQ works with the user’s iPhone. The company says it will start shipping units in 2018 at an initial price of about $2,000. See also adding orientation to ultrasound to turn 2D into 3D.
  3. Innovation vs. Activist Investors (Steve Blank) -- "activist investor" is all about financial games to transfer cash from banks to the investors, by loading the company with debt. The bad news is that, once they take control of a company, activist investors’ goal is not long-term investment. They often kill any long-term strategic initiatives. Often, the short-term cuts directly affect employee salaries, jobs, and long-term investment in R&D. The first things to go are R&D centers and innovation initiatives. They don't want genuine growth; they want fake growth that leaves the company weaker.
  4. Security, Privacy, and the Internet of Things (Matt Webb) -- if I meet a startup that has spent ages on its security, pre getting some real customer traction, I am going to be nervous that they have over-engineered the product and won't be able to iterate. The product will be too brittle or too rigid to wiggle and iterate and achieve fit. So, it's a balance.

Continue reading Four short links: 1 November 2017.

(image)



The Dao of defense: Choosing battles based on the seven chakras of security

2017-10-31T20:00:00Z

(image)

Katie Moussouris explains how to turn the forces that resist defense activities into the biggest supporters.

Continue reading The Dao of defense: Choosing battles based on the seven chakras of security.

(image)



Enterprise security: A new hope

2017-10-31T20:00:00Z

(image)

Haroon Meer says a new type of security engineering is taking root, which suggests hope for effective corporate security at enterprise scale.

Continue reading Enterprise security: A new hope.

(image)



Empowering through security

2017-10-31T20:00:00Z

(image)

Fredrick Lee shines a light on the ways security can be allowed into the world to do more.

Continue reading Empowering through security.

(image)



Why cloud-native enterprise security matters

2017-10-31T20:00:00Z

(image)

Matt Stine looks at three principles of cloud-native security and explains an approach that addresses the increasing volume and velocity of threats.

Continue reading Why cloud-native enterprise security matters .

(image)



Great software is secure software

2017-10-31T20:00:00Z

(image)

Chris Wysopal explains how defenders can help developers create secure software through coaching, shared code, and services.

Continue reading Great software is secure software.

(image)



Highlights from the O'Reilly Security Conference in New York 2017

2017-10-31T20:00:00Z

(image)

Watch highlights covering security, defense, culture, and more. From the O'Reilly Security Conference in New York 2017.

Defenders from across the security world are coming together for the O'Reilly Security Conference in New York. Below you'll find links to highlights from the event.

Continue reading Highlights from the O'Reilly Security Conference in New York 2017.

(image)



Amit Vij on GPU-accelerated analytics databases

2017-10-31T14:10:00Z

(image)

The convergence of big data, artificial intelligence, and business intelligence

In this episode of the O’Reilly Podcast, I speak with Amit Vij, CEO and co-founder of Kinetica, a company that has developed an analytics database that uses graphics processing units (GPUs). We talk about how organizations are using GPU-accelerated databases to converge artificial intelligence (AI) and business intelligence (BI) on a single platform.

src="https://w.soundcloud.com/player/?url=https%3A//api.soundcloud.com/tracks/349443738&auto_play=false&hide_related=false&show_artwork=true" height="166" width="100%" frameborder="no" scrolling="no">

Discussion points:

  • The benefits of converging AI and BI in a single system: “You are orders of magnitude faster, and you have the ability to operate on real-time data, as opposed to operating on yesterday’s data,” Vij says.
  • The processing speed of GPUs: “The GPU really leverages parallel processing so you can maximize your throughput and take advantage of the advancements in hardware that have come about,” he says.
  • How GPU databases break down the walls between the data science and business domains: “Nowadays machine learning scientists and mathematicians can, in just three lines through SQL, execute their algorithms directly on data sets that are billions of objects,” Vij says.
  • How GPU databases integrate with machine learning tools such as TensorFlow, and how GPU applications can use the cloud

Continue reading Amit Vij on GPU-accelerated analytics databases.

(image)



Four short links: 31 October 2017

2017-10-31T11:05:00Z

AI for Databases, One-Pixel Attacks, Adtech Uncanny Valley, and Mindreading Video

  1. Inference and Regeneration of Programs that Manipulate Relational Databases -- We present a new technique that infers models of programs that manipulate relational databases. This technique generates test databases and input commands, runs the program, then observes the resulting outputs and updated databases to infer the model. Because the technique works only with the externally observable inputs, outputs, and databases, it can infer the behavior of programs written in arbitrary languages using arbitrary coding styles and patterns.
  2. One-Pixel Attack for Fooling Deep Neural Networks -- The results show that 73.8% of the test images can be crafted to adversarial images with modification just on one pixel with 98.7% confidence on average.
  3. Facebook Is Not Listening To You -- but we are deep in the adtech uncanny valley.
  4. Recovering Video from fMRI -- the video and stills are impressive. (Still a blurry black-and-white picture and a set of guessed possible labels.)

Continue reading Four short links: 31 October 2017.

(image)



Four short links: 30 October 2017

2017-10-30T10:00:00Z

README Maturity Model, Open Source Project Maturity Model, Walmart Robots, and Sparse Array Database

  1. README Maturity Model -- from bare minimum to purpose.
  2. Apache's Open Source Project Maturity Model -- It does not describe all the details of how our projects operate, but aims to capture the invariants of Apache projects and point to additional information where needed.
  3. Walmart is Getting Robots -- The retailer has been testing the robots in a small number of stores in Arkansas and California. It is now expanding the program and will have robots in 50 stores by the end of January.
  4. TileDB -- manages massive dense and sparse multi-dimensional array data that frequently arise in important scientific applications.

Continue reading Four short links: 30 October 2017.

(image)



Four short links: 27 October 2017

2017-10-27T11:50:00Z

Gentle PR, Readable Arxiv, Sentiment Bias, and AI Coding from Sketches

  1. Tick Tock List (Matt Webb) -- simple and good advice for building working relationships with journalists.
  2. Arxiv Vanity -- renders papers from Arxiv as responsive web pages so you don't have to squint at a PDF.
  3. Sentiment Analysis Bias -- By classifying the sentiment of words using GloVe, the researchers "found every linguistic bias documented in psychology that we have looked for." Unsurprising, since the biases are present in the people who generate the text from which these systems are trained.
  4. AI Turns Sketched Interfaces into Prototype Code -- We built an initial prototype using about a dozen hand-drawn components as training data, open source machine learning algorithms, and a small amount of intermediary code to render components from our design system into the browser. We were pleasantly surprised with the result.

Continue reading Four short links: 27 October 2017.

(image)



5 steps to identify and validate a value proposition

2017-10-27T11:00:00Z

(image)

Value propositions are a dime a dozen. Learn how to choose the ones that work.

Continue reading 5 steps to identify and validate a value proposition.

(image)



How to pick the right authoring tools for VR and AR

2017-10-27T10:00:00Z

(image)

Identify the options available to develop an effective immersive experience.

The world of virtual reality (VR), augmented reality (AR), and mixed reality (MR) is growing at a seemingly exponential pace. Just a few key examples: Microsoft partnered with Asus and HP to release new MR headsets, Google glasses have made a comeback, Facebook Spaces launched, and a patent for AR glasses, filed by Apple in 2015, was just discovered during a patent search.

At the Apple WorldWide Developer Conference (WWDC) this past June, Apple announced ARKit, which will make augmented reality available to all 700 million worldwide users of iPhone and iPad. The momentum and economic impact of these experiences continues to accelerate, so it’s the perfect time to begin developing for them, and that means picking an authoring tool that’s right for the reality you want to create.

Continue reading How to pick the right authoring tools for VR and AR.

(image)



Machine intelligence for content distribution, logistics, smarter cities, and more

2017-10-26T14:20:00Z

(image)

The O’Reilly Data Show Podcast: Rhea Liu on technology trends in China.

In this episode of the Data Show, I spoke with Rhea Liu, analyst at China Tech Insights, a new research firm that is part of Tencent’s Online Media Group. If there’s one place where AI and machine learning are discussed even more than the San Francisco Bay Area, that would be China. Each time I go to China, there are new applications that weren’t widely available just the year before. This year, it was impossible to miss bike sharing, mobile payments seemed to be accepted everywhere, and people kept pointing out nascent applications of computer vision (facial recognition) to identity management and retail (unmanned stores).

Continue reading Machine intelligence for content distribution, logistics, smarter cities, and more.

(image)



Four short links: 26 October 2017

2017-10-26T11:55:00Z

License Plates, Speech Recognition, Social Proof, and Engineering Growth

  1. FMTYEWTK About Home-Made License Plate Readers -- this chap, horrified by an $86M government project, built a prototype in 57 lines of code. He talks here about the shortcomings of the prototype, and along the way you learn a lot about ALPR, Automated License Plate Recognition. So if $1 million gets you to 80% accuracy, and maybe $10 million gets you to 90% accuracy—when do you stop spending?
  2. Speech Recognition Is Not Solved -- The recent improvements on conversational speech are astounding. But, the claims about human-level performance are too broad. Below are a few of the areas that still need improvement.
  3. Social Proof -- five principles of social proof: 1. Avoid negative social proof; 2. Combine social proof with authority; 3. Combine social proof with scarcity; 4. Social proof works best with similar people; 5. Boost social proof with user-generated content.
  4. Engineering Growth Frameworks -- Documentation for Medium’s professional growth framework. Super useful for engineering organizations that don't yet have their own.

Continue reading Four short links: 26 October 2017.

(image)



How FINRA benefits from a cloud data lake

2017-10-26T11:00:00Z

Solving challenges of data analytics to make data accessible to all.One of the classic challenges of analytics is making data accessible to all. The data needed for analytics and data science is often locked away in different data silos, where it is difficult to discover and access. Analysts and data scientists who want to derive new insights from data within the enterprise must work with a large number of data stewards and data engineers to build their own map of data assets and source data from various data silos. As a result of our move at FINRA to a managed data lake architecture in the cloud, our organization arrived at a solution to this problem, as well as introducing significant improvements in the flexibility of our data processing pipeline that prepares data for analytics. In this process, I’ll describe our approach. A challenge of big data FINRA is the Financial Industry Regulatory Authority, a not-for-profit organization authorized by Congress to protect America’s investors by making sure the broker-dealer industry operates fairly and honestly. FINRA’s Market Regulation group monitors 99% of all equity trades and 70% of all option trades in the U.S. This is done by processing billions of records a day of trade data from brokerage firms and exchanges. The data is validated, transformed, and prepared for analytic use. Once the data is ready for analytics, hundreds of automated detection models are run against the data to look for indicators of potential market manipulation, insider trading, and abuse—generating exception alerts when a pattern is matched. From there, regulatory analysts interactively delve deeper into the data to determine whether a regulatory issue exists. To stay abreast of emerging regulatory problems and develop new detection algorithms, a team of data scientists continually explores the data and develops new detection models. To process and operate at these volumes, FINRA made early investments over a decade ago in cutting-edge, emerging data-warehouse appliances. This required a significant initial investment along with subsequent re-investments to expand capacity. Despite these investments, we still faced continual capacity challenges. Additionally, these appliances were complex to manage and operate in a dynamic business environment. Market volumes can fluctuate significantly day-to-day—sometimes by a factor of three or more. Regardless of fluctuations, FINRA must run its validation, ETL, and detection algorithms on the data within time frames specified in our service level agreements (SLAs). Investing in the capacity to meet a[...]



How to search in the MarkLogic database

2017-10-26T10:30:00Z

There are advantages to having a search engine built into a database.MarkLogic is best known as a multi-model database, meaning that it stores two different types of data (documents and RDF triples), while providing index-supported document, SPARQL, and SQL queries. In addition, a search engine is an integral part of the database itself. Thus, search and query are really the same concept in MarkLogic. Having search built into the database may seem like an unusual architecture at first glance. However, there are real advantages to this arrangement. The architecture is dramatically simplified. The application tier can go to one service for any type of data request, whether it's a common database query or the type of search normally powered by a separate search engine. This also means there’s no need to configure a separate server, install and maintain additional software, and retain operations people to manage the search engine. Transactional updates to the database are immediately available in the search indexes. In the MarkLogic architecture, the document model is used to store data. MarkLogic natively stores XML, JSON, text, and binary documents. Many types of data either start off in some document form, or are very simply converted to one of them. That means many forms of data can be loaded as they currently are. With MarkLogic’s Universal Index, any text content, along with the structure of XML and JSON documents, are automatically indexed and made available for search. Immediately after loading content, developers can begin running searches to explore and better understand the data they have. Within this model, application development begins right away, with data modeling shifting to a refinement activity, done to meet the needs of application requirements as they are worked on. Searching documents in MarkLogic is a two-step process. The first step is Index Resolution, in which the query is compared to the indexes to identify matching candidate documents. The next step is filtering to eliminate false positives where the indexes don’t have the information necessary to answer the query. This step reflects the configurability of the search engine. MarkLogic offers more than 30 types of indexes, allowing for type-specific range queries, phrase searches, SPARQL queries against RDF triples, even SQL queries on tabular information extracted from documents. By knowing how best to configure and apply the available indexes, the filtering stage can be turned off for many applications. This allows queries to run faster by avoidin[...]



Why the IoT revolution needs telcos’ core skills

2017-10-26T10:00:00Z

(image)

Fast data and virtualization are shifting the way telcos approach the IoT.

The Internet of Things (IoT) isn’t just a transformative trend for the future; it’s already here. Vodafone’s IoT barometer report for 2017/2018 surveyed 1,278 organizations across all major regions and sectors, and found that 29% had already launched IoT (up from 13% in 2013). What’s more, 51% of adopters said that IoT is already increasing revenue or generating new revenue streams.

What’s not yet clear is who will capture most of that IoT revenue. An analysis from Frost and Sullivan identifies eight different layers in the IoT ecosystem, from hardware and connectivity to analytics and security, each of which represents a different potential revenue stream. As the author noted, “Everyone makes money but not everyone profits.”

Continue reading Why the IoT revolution needs telcos’ core skills.

(image)



The 3 foundations of Lean UX

2017-10-25T16:10:00Z

Explore the principles you’ll need to make Lean UX successful.Lean UX stands on a number of important foundations: it’s a combination of a few different schools of thought. Understanding where it comes from will help you to apply the method and find resources when you get stuck. The first foundation of Lean UX is user experience design. Lean UX is, at its heart, a way of practicing user experience design. Drawing on roots in the fields of human factors and ergonomics, as well as the human-centered design ideas that emerged in the 1950s with the work of industrial designers like Henry Dreyfuss, today we call these methods and mindsets user experience design (or just UX), a term credited to Don Norman. UX embraces a number of design fields, including interaction design, information architecture, graphic design, and many others. But the heart of UX practice is that it begins by identifying human needs—the needs of the users of the system. In the past decade, we’ve seen the rise in popularity of Design Thinking. Design Thinking emerged in academia in the 1970s and 1980s, and was popularized by the design firm IDEO in the early 2000s. It is a way of applying human-centered design methods to a wide range of problems. Tim Brown, CEO and president of IDEO, described Design Thinking as, “innovation powered by...direct observation of what people want and need in their lives, and what they like or dislike about the way particular products are made, packaged, marketed, sold, and supported.” Brown continued, “[it’s] a discipline that uses the designer’s sensibility and methods to match people’s needs with what is technologically feasible and what a viable business strategy can convert into customer value and market opportunity.” Design Thinking is important for Lean UX because it takes the explicit position that every aspect of a business (or any other system) can be approached with design methods. It gives designers permission to work beyond their typical boundaries. It also encourages nondesigners to use design methods to solve the problems they face in their roles. So, UX and its cousin Design Thinking form the critical first foundation that encourages teams to consider human needs, collaborate across roles, and approach product design from a holistic perspective. The next foundation of Lean UX is Agile software development. Software developers have been using Agile methods for years to reduce their cycle times, build a cadence of continuous learning, and deliver customer value[...]



Charles Givre on the impetus for training all security teams in basic data science

2017-10-25T13:30:00Z

(image)

The O’Reilly Security Podcast: The growing role of data science in security, data literacy outside the technical realm, and practical applications of machine learning.

In this episode of the Security Podcast, I talk with Charles Givre, senior lead data scientist at Orbital Insight. We discuss how data science skills are increasingly important for security professionals, the critical role of data scientists in making the results of their work accessible to even nontechnical stakeholders, and using machine learning as a dynamic filter for vast amounts of data.

Continue reading Charles Givre on the impetus for training all security teams in basic data science.

(image)



Four short links: 25 October 2017

2017-10-25T11:45:00Z

Simpson's Paradox, Attention Economics, Dynamic Programming, and Retro Unit Testing

  1. Simpson's Paradox in Behavioral Data -- current behavioral data is highly heterogeneous: it is collected from subgroups that vary widely in size and behavior. Heterogeneity is evident in practically all social data sets and can be easily recognized by its hallmark, the long-tailed distribution. The prevalence of some trait in these systems, whether the number of followers in an online social network, or the number of words used in an email, can vary by many orders of magnitude, making it difficult to compare users with small values of the trait to those with large values. As shown in this paper, heterogeneity can dramatically distort conclusions of analysis.
  2. The Economics of Attention Markets -- Based on conservative estimates, in 2016 a typical American adult spent about 4.9 hours of a day focused mainly on consuming content from these media properties. That amounted to about 437 billion hours for all adults. Advertisers paid roughly $199 billion that year to media businesses to deliver messages to those consumers during those hours. That is the market for attention. Consumers supply time—their attention—to the market in return for content that entertains or informs them. Advertisers demand attention so they can deliver messages that will increase their sales and profits. Attention platforms—ad-supported media businesses—broker the connections between consumers and advertisers. This paper provides a primer on the economics of this market.
  3. Dynamic Programming from First Principles -- a readable introduction to a subject we covered in my third-year CS analysis of algorithms class.
  4. ZX-Spec -- a unit testing framework for Sinclair ZX Spectrum assembly. I boggle.

Continue reading Four short links: 25 October 2017.

(image)