Subscribe: O'Reilly Radar - Insight, analysis, and research about emerging technologies
http://radar.oreilly.com/atom.xml
Added By: Feedage Forager Feedage Grade B rated
Language: English
Tags:
big data  continue reading  continue  data  deep learning  deep  learning  links september  new  reading  short links  short 
Rate this Feed
Rate this feedRate this feedRate this feedRate this feedRate this feed
Rate this feed 1 starRate this feed 2 starRate this feed 3 starRate this feed 4 starRate this feed 5 star

Comments (0)

Feed Details and Statistics Feed Statistics
Preview: O'Reilly Radar - Insight, analysis, and research about emerging technologies

All - O'Reilly Media



All of our Ideas and Learning material from all of our topics.



Updated: 2017-09-26T09:34:23Z

 



Four short links: 25 September 2017

2017-09-25T11:15:00Z

Group Theory Coloring Book, Architecture Diagrams, Cloud Landscape, and Internet of Radios

  1. Illustrated Group Theory -- a coloring book.
  2. Documenting Your Architecture -- clever use of Wireshark (nee Ethereal) and PlantUML, with a REPL, to map the interactions between components on a web system. What a clever hack.
  3. Cloud Native Landscape Project -- what's what in the world of cloud ops: Public Cloud, Provisioning, Runtime, Orchestration & Management, App Definition & Development, Platforms, Observability & Analysis. Mighty useful!
  4. WebSDR -- an Internet of radios connected to the Internet, which you can tune to your heart's content. (via Hacker News)

Continue reading Four short links: 25 September 2017.

(image)



Apache MXNet—the fruit of cross-institutional collaboration

2017-09-25T10:00:00Z

MXNet’s origins show through in its power and flexibility.The Apache MXNet framework (incubating at the Apache Software Foundation) was developed to enable multiple approaches to the problem of deep learning. One route for reducing the time it takes to train deep learning models involves defining the model and separating it from the algorithm. While this approach speeds training, it can add constraints and complexity because it is hard to update as understanding of the problem improves. Other neural network libraries address this by adding more flexibility, but at the cost of training speed. Perhaps appropriately, the idea of taking more than one route to tackling a problem is paralleled in other technical aspects of the framework, and in the Apache MXNet community itself. MXNet gives developers the best of both worlds. It provides a concise, easy to understand, dynamic programming interface for defining both the model and the algorithm, without sacrificing training speed. The MXNet community has deep roots in cross-institutional collaboration: the original MXNet publication had authors from 10 institutions. The project has made the most of the variety of backgrounds, with each team developing the project to meet its own needs and institutional traditions and commitments. It should not be surprising that MXNet was “born” with APIs for C++, Python, R, Scala, Matlab, Javascript, Go, and Julia. While the academic origins of MXNet are recent, the breadth of its contributor base means MXNet not only supports building products in a variety of languages, but also a variety of computational hardware. For training models, this means GPUs, naturally; and for inference, this can mean running models on anything from mobile devices to other lightweight general-purpose computers (e.g., Raspberry Pi) to purpose-specific FPGAs or other IoT device architectures. MXNet’s design and community are both well-positioned to stay on top of all these developments.​ The initial variety of contributors and approaches to neural networks has continued to evolve. Like other deep learning frameworks, the main contributors and users of Apache MXNet are either providing analytics as a service (and keeping MXNet “under the hood”) or building customized pipelines for a vertical-specific AI application. Among the latter, we see TuSimple building an autonomous driving platform, TwoSense with a behavioral biometric and identification tool, and some teams within Amazon (fulfillment center management and robotics, for example, or Sockeye, the sequence-to-sequence machine translation framework). Apache MXNet is also used as part of larger analytics suites, like Wolfram, which includes a high-level front end for MXNet in its latest release (Wolfram Research is also a significant contributor of code to the project). Microsoft is taking the lead on integrating MXNet into the R language (among other things). Several of Amazon’s products use MXNet, including Amazon Rekognition for image analysis, Amazon Echo products, including the Echo Look fashion assistant, Amazon Lex, and the Amazon.com recommendation engine. And with the release of Core ML, Apple is contributing to Apache MXNet to bring deep learning models to Apple devices. Deep learning is the most disruptive technology in 2017—no longer the exclusive domain of academic researchers, it is now expected to be on the roadmap of any data-driven organization. The power and flexibility of MXNet make it possible to build prototypes and implement them in a variety of production environments. Note: Apache MXNet is an effort undergoing incubation at the Apache Software Foundation (ASF). For more information, visit the project website. Continue reading Apache MXNet—the fruit of cross-institutional collaboration.[...]



Data liquidity in the age of inference

2017-09-22T10:00:00Z

(image)

Probabilistic computation holds too much promise for it to be stifled by playing zero sum games with data.

It's a special time in the evolutionary history of computing. Oft-used terms like big data, machine learning, and artificial intelligence have become popular descriptors of a broader underlying shift in information processing. While traditional rules-based computing isn’t going anywhere, a new computing paradigm is emerging around probabilistic inference, where digital reasoning is learned from sample data rather than hardcoded with boolean logic. This shift is so significant that a new computing stack is forming around it with emphasis on data engineering, algorithm development, and even novel hardware designs optimized for parallel computing workloads, both within data centers and at endpoints.

A funny thing about probabilistic inference is that when models work well, they’re probably right most of the time, but always wrong at least some of the time. From a mathematics perspective, this is because such models take a numerical approach to problem analysis, as opposed to an analytical one. That is, they learn patterns from data (with various levels of human involvement) that have certain levels of statistical significance, but remain somewhat ignorant to any physics-level intuition related to those patterns, whether represented by math theorems, conjectures, or otherwise. However, that’s also precisely why probabilistic inference is so incredibly powerful. Many real-world systems are so multivariate, complex, and even stochastic that analytical math models do not exist and remain tremendously difficult to develop. In the meanwhile, their physics-ignorant, FLOPS-happy, and often brutish machine learning counterparts can develop deductive capabilities that don’t nicely follow any known rules, yet still almost always arrive at the correct answers.

Continue reading Data liquidity in the age of inference.

(image)



Four short links: 22 September 2017

2017-09-22T08:00:00Z

Molecular Robots, Distributed Deep Nets, SQL Notebook, and Super-Accurate GPS

  1. Scientists Create World’s First ‘Molecular Robot’ Capable Of Building Molecules -- Each individual robot is capable of manipulating a single molecule and is made up of just 150 carbon, hydrogen, oxygen and nitrogen atoms. To put that size into context, a billion billion of these robots piled on top of each other would still only be the same size as a single grain of salt. The robots operate by carrying out chemical reactions in special solutions which can then be controlled and programmed by scientists to perform the basic tasks. (via Slashdot)
  2. Distributed Deep Neural Networks -- in Adrian Colyer's words: DDNNs partition networks between mobile/embedded devices, cloud (and edge), although the partitioning is static. What’s new and very interesting here though is the ability to aggregate inputs from multiple devices (e.g., with local sensors) in a single model, and the ability to short-circuit classification at lower levels in the model (closer to the end devices) if confidence in the classification has already passed a certain threshold. It looks like both teams worked independently and in parallel on their solutions. Overall, DDNNs are shown to give lower latency decisions with higher accuracy than either cloud or devices working in isolation, as well as fault tolerance in the sense that classification accuracy remains high even if individual devices fail. (via Morning Paper)
  3. Franchise -- an open-source notebook for sql.
  4. Super-Accurate GPS Chips Coming to Smartphones in 2018 (IEEE Spectrum) -- 30cm accuracy (today: 5m), will help with the reflections you get in cities, and with 50% energy savings.

Continue reading Four short links: 22 September 2017.

(image)



How to start application tracing

2017-09-21T10:00:00Z

(image)

A hands-on demonstration for implementing tracing in modern applications that introduces tracing through the CNCF’s OpenTracing project.

Continue reading How to start application tracing.

(image)



Jim Blandy and Jason Orendorff on Rust

2017-09-21T10:00:00Z

(image)

The O’Reilly Programming Podcast: A look at a new systems programming language.

In this episode of the O’Reilly Programming Podcast, I talk with Jim Blandy and Jason Orendorff, both of Mozilla, where Blandy works on Firefox’s web developer tools and Orendorff is the module owner of Firefox’s JavaScript engine. They are the authors of the new O’Reilly book Progamming Rust.

Continue reading Jim Blandy and Jason Orendorff on Rust.

(image)



Four short links: 21 September 2017

2017-09-21T08:00:00Z

Synthetic Muscles, Smarter SSH, Kickstarter Post-Mortem, and Computational Drawing

  1. Additive Synthetic Muscles -- electrically-actuated high stress, high strain, low density, 3D-printable muscles.
  2. teleport -- modern SSH that groks bastion hosts, certificates, and more.
  3. Anatomy of a Kickstarter -- It is possible to outsource much of the Kickstarter process, including copywriting, fulfilment, customer support and marketing. I treated the whole process as a learning experience and set aside 50% of my time for three months to appreciate its nuances from start to finish, with a hard-stop due to other commitments. Post-Kickstarter I committed another three months over the following year to deliver experiences such as the expedition to Afghanistan and stretch goals. BackerKit was the obvious candidate to outsource operations to, but was rejected for violating the no-asshole rule: they were tone-deaf, evasive on responding to cost estimates, and nagging in a way that only organisations that live and die by CRM systems can be.
  4. rune -- a JavaScript library for programming graphic design systems with SVG in both the browser or node.

Continue reading Four short links: 21 September 2017.

(image)



Accelerating AI

2017-09-20T20:00:00Z

(image)

Steve Jurvetson examines the state of artificial intelligence.

Continue reading Accelerating AI.

(image)



Our Skynet moment

2017-09-20T20:00:00Z

(image)

Tim O'Reilly says the algorithms that shape our economy must be rewritten if we want to create a more human-centered future.

Continue reading Our Skynet moment.

(image)



AI mimicking nature: Flying and talking

2017-09-20T20:00:00Z

(image)

Lili Cheng shares two examples of AI that were inspired by nature.

Continue reading AI mimicking nature: Flying and talking.

(image)



How to escape saddlepoints efficiently

2017-09-20T20:00:00Z

(image)

Michael Jordan discusses recent results in gradient-based optimization for large-scale data analysis.

Continue reading How to escape saddlepoints efficiently.

(image)



Build smart applications with your new super power: Cloud AI

2017-09-20T20:00:00Z

(image)

Philippe Poutonnet discusses how you can harness the power of machine learning, whether you have a machine learning team of your own or you just want to use machine learning as a service.

Continue reading Build smart applications with your new super power: Cloud AI.

(image)



Why democratizing AI matters: Computing, data, algorithms, and talent

2017-09-20T20:00:00Z

(image)

Jia Li explains why a democratized approach to AI ensures that the components behind these technologies reach the widest possible audience.

Continue reading Why democratizing AI matters: Computing, data, algorithms, and talent.

(image)



Fireside chat with Naveen Rao and Steve Jurvetson

2017-09-20T20:00:00Z

(image)

A discussion on the impact and opportunities of artificial intelligence.

Continue reading Fireside chat with Naveen Rao and Steve Jurvetson.

(image)



Handling checked exceptions in Java streams

2017-09-20T18:00:00Z

(image)

Know your options for managing checked exceptions in Java 8’s functional approach.

Several decisions were made during the creation of the Java language that still impact how we write code today. One of them was the addition of checked exceptions to the language, which the compiler requires you to prepare for with either a try/catch block or a throws clause at compile time.

If you've moved to Java 8, you know that the shift to functional programming concepts like lambda expressions, method references, and streams feels like they completely remade the language.

Continue reading Handling checked exceptions in Java streams.

(image)



Environmental sensing with recycled materials

2017-09-20T11:00:00Z

Electronic waste is an economic and environmental problem, but citizen scientists can take action by using harvested sensors from discarded electronics. Environmental sensing—the process of gathering information from ecological systems—is an essential part of ecology and sustainable agriculture. However, sensors can be expensive and difficult for citizen scientists to obtain, even though their parts are all around us, in the form of technological waste. When a gadget breaks, it is often easier and cheaper to throw it away and purchase a new one than to attempt to repair it. Citizen scientists can take advantage of this unfortunate by-product of "throw away culture" by harvesting the sensor technology that is often found in e-waste. In this article, we discuss an approach to the development of such sensors. When assessing and addressing environmental issues, especially at a local level, it is more advantageous to involve community members—those who are directly affected by such issues—than scientists and academics. Such an approach has been found to be both faster and more efficient, as dedication amongst local volunteers has been found to be much higher than those with little attachment or stake in the success of the project (Danielsen et al. 2010, 1166–1168). However, oftentimes there is very little funding and resources available to citizen scientists and thus necessitates support from non-local institutes. There are a number of projects that purport to address exactly this issue. However as we traverse this technological landscape of citizen sensing, although the economic climate is shifting toward affordability, mass distribution of these devices (in detector arrays, for example) is still quite out of the scope of many budgets. According to Sui and Elwood in “Crowdsourcing Geographic Knowledge”, there exists four levels of participation in citizen science activities. The majority of the projects outlined here fall within the first two levels of engagement; however, this should not be interpreted as an inability for citizen scientists to participate in environmental sensing projects at higher levels. While there are many ecological sensing projects worthy of examination, to list them all would be well outside the scope of this paper; therefore only a select few are outlined. Figure 1-1. Sui and Elwood propose 4 levels of involvement in citizen scientists ranging from "passive sensors" to "active collaborators" (Sui and Elwood, 2013) Smart Citizen One of the most polished options for citizen-scientist environmental sensing is the Smart Citizen project. The Smart Citizen Kit is billed as “an Open-Source Environmental Monitoring Platform consisting of arduino-compatible hardware, data visualization web API, and mobile app” (Smart Citizen 2014). It is the result of a crowdfunding effort on Kickstarter by Fab Lab Barcelona at the Institute for Advanced Architecture of Catalonia. The sensor board can measure air composition (CO and NO2), temperature, light intensity, sound levels, and humidity. It is capable of communicating data wirelessly to iOS devices via the Smart Citizen App. The kit itself consists of three boards: the ambient board, which houses the sensors; a data-processing board based on an ATMega32u4; and a Baseboard with USB socket, SD card reader, EEPROM, battery holder, and clock (The Smart Citizen Kit: Crowdsourced Environmental Monitoring 2014). The Smart Citizen Kit places a particular emphasis on large-scale collaboration. Users can register their sensor board on the Smart Citizen website and communicate their local conditions over the web. [...]



Good research starts with good questions

2017-09-20T11:00:00Z

How to construct inquiries that will result in good, useful data.Researchers always struggle when it comes to writing down the questions they need to ask their participants. Sure, this gets easier over time and with experience, but the act of writing an interview guide or test plan never gets “easy.” At the end of the day, we are all human and we are susceptible to our own weaknesses and limitations. The deck is stacked against us when you start to consider social, personal, professional, and sometimes logistical factors that can inhibit our ability to have a conversation with someone else. Predicting all these factors before research even starts is no small feat. This in turn makes writing down lines of inquiry that will result in good, useful data seem daunting. But you have to start somewhere and iterate as you learn what questions work and which fall flat. To help you with this, first we need to discuss what role questions fulfill when you’re conducting any type of research. The role of questions in research It’s hard to conduct research when you don’t know what question needs to be answered. Every research effort starts with you needing to know why something happens, what people do in certain circumstances, and how they perform key tasks. To answer these questions, we must find people to talk to and phrase our questions effectively to get to the heart of the matter. Otherwise, we would be making wild guesses and shooting in the dark. While that’s often tempting, this degree of freedom leads to failure and your product never seeing the light of day. How good questions go wrong We can’t tell you how many times we’ve written down a question and thought, “This is it! This will get us some awesome information from people,” only to have it fall flat during a session. This happens to all researchers and it will happen to you. And that’s OK! Bad questions can be mitigated through the planning phase if you know what makes a question go bad. The following factors can lead to misinformed or poor research results. Leading questions It’s easy to get caught up in the excitement of research. This can trick you into asking questions that give participants a clue, or directly point them, to the type of answer you’re looking for. These are called leading questions, and they can hinder your research session and the data collected. An example of a leading question would be asking, “How do you use Outlook to communicate your work status?” A better alternative would be “How do you communicate your work status?” The second question allows more responses than leading the participant to describe a specific use of email. Research participants want to be helpful and want to provide value to your team. Since they are primed to help, if you ask a question that implies the type of answer you want, they are more likely to give you that answer, even if it doesn’t really apply to them. Shallow questions One golden rule of research is never ask yes/no questions. When creating questions for an upcoming research effort, you’ll find avoiding these questions is hard. Yes/no questions are harmful because they give participants an easy out. The question “Do you use Yammer for team discussions?” can quickly be answered and dismissed. Participants don’t have to think deeply to respond, and they are giving you confirmation that may or may not be useful. A better question is “How do you communicate with your team throughout the day?” Personal bias We all have our own beliefs about how products work, or how they sho[...]



Recognizing and evaluating scientific claims in security

2017-09-20T10:00:00Z

(image)

Five questions for Josiah Dykstra on techniques to expose and invalidate misleading claims.

I recently sat down with Josiah Dykstra, Senior Security Researcher at the Department of Defense, to discuss the topics of both accidental and intended misleading communications in security, common pitfalls made in evaluating scientific claims, and the questions you should ask when evaluating scientific claims and third-party vendor solutions.

What are some basic tips for recognizing and understanding scientific claims in security marketing, journalism, or other security-related materials?

People and companies use a variety and spectrum of truly scientific, possibly-scientific, and unscientific statements to talk about products and services. Some are trying to persuade you to buy something, others are simply trying to communicate information. Though scientists themselves can produce misleading and manipulative results, I am generally more concerned about the potential damage caused from seemingly scientific-sounding claims by other sources.

Continue reading Recognizing and evaluating scientific claims in security.

(image)



You need an Analytics Center of Excellence

2017-09-20T10:00:00Z

Learn how to add big data to your organization's business processes.More than 10 years after big data emerged as a new technology paradigm, it is finally in a mature state and its business value throughout most industry sectors is established by a significant number of use cases. A couple of years ago, the discussion was still about how big data changed our way of capturing, processing, analyzing, and exploiting data in new and meaningful ways for business decision makers. Now many companies undertake analytical projects at a departmental level, redefining the relationship between business and IT by the adoption of Agile and DevOps methodologies. Real-time processing, machine learning algorithms, and even artificial intelligence are the new normal in business talk. However, companies are still struggling to adopt big data at a corporate level. In many corporations, there is a gap between launching departmental projects and industrializing and scaling-up those use cases across corporations. Embedding big data in scalable business processes is crucial to becoming a data-driven organization. Building an Analytics Center of Excellence (ACoE) can be the basis for this transformation. Remaining challenges There are three important issues that must be addressed in order to scale-up big data across a corporation and make a real impact on business outcomes: 1. Lack of skills across the organization. There is an identified global shortage of analytical talent, a set of data experts ranging from data engineers, big data architects, and data scientists. It is not easy for a company to find these profiles, attract them and retain them. And it gets more difficult as technologies continuously evolve at a challenging, rapid pace. When a company employs these experts, they are not always equally distributed throughout the organization but sometimes concentrated in a particular department or business function (for example, in the risk or marketing departments), making it difficult to leverage these skills for the good of the entire organization. If a company has multiple locations, it is even harder to keep the right balance of skills in all the subsidiaries. The shortage of skills affects the technical or analytical departments as well as the business areas. Companies need subject matter experts who understand business needs to communicate with the data experts, as well as managers able to make decisions based on data and supported by facts more than by personal, biased experience. Furthermore, the new skills require new ways of working and therefore an organizational cultural change. 2. Lack of standards, methodology, and governance Even mature organizations with analytics teams in place in different departments, business units, or countries find that every team tends to work with their own tools, libraries, software versions, and data sets. This variety can make it difficult to industrialize and implement global solutions, and ensure code reusability. Companies need to define standards regarding coding, tools, version control, and quality control, and have all the teams working with the same tools and sharing their methodologies. Additionally, analytics teams must have big data governance policies and processes in place, controlling and limiting the access of data in the data lake and ensuring security and data privacy controls. In Europe, for instance, a new General Data Protection Regulation (GDPR) requires a very demanding process regarding data traceability at a field level. Data is a key asset, and companies will be requi[...]



Four short links: 20 September 2017

2017-09-20T08:00:00Z

AI Needs Ethics, Automotive-Grade Linux, Drawing Clocks, and Facial Recognition

  1. AI Research Needs an Ethical Watchdog (Wired) -- Right now, if government-funded scientists want to research humans for a study, the law requires them to get the approval of an ethics committee known as an institutional review board, or IRB. Stanford’s review board approved Kosinski and Wang’s study. But these boards use rules developed 40 years ago for protecting people during real-life interactions, such as drawing blood or conducting interviews. “The regulations were designed for a very specific type of research harm and a specific set of research methods that simply don’t hold for data science,” says Metcalf.
  2. Automotive-Grade Linux Debuts On The 2018 Toyota Camry -- you heard it here first: 2018 is the year of the Linux hatchback. You heard it here first!
  3. Clocks for Software Engineers -- The first and perhaps most difficult part of learning hardware design is to learn that all hardware design is parallel design. Things don’t take place serially, as in one instruction after another ... like they do in a computer. Instead, everything happens at once.
  4. Facial Recognition is Here to Stay -- I have to admit that when I saw facial recognition improving, and realised it'd be useful in a few years, I never imagined the use case would be "so the cashier at Chik-Fil-A would know your name."

Continue reading Four short links: 20 September 2017.

(image)



Fast forwarding AI in the datacenter

2017-09-19T20:00:00Z

(image)

Lisa Spelman shares how businesses are benefiting from Intel's flexible solutions for AI and how Intel is fostering the continued growth of the AI ecosystem.

Continue reading Fast forwarding AI in the datacenter.

(image)



Engineering the future of AI for businesses

2017-09-19T20:00:00Z

(image)

Ruchir Puri addresses the opportunities and challenges of AI for business and focuses on what's needed to scale AI across the breadth of enterprises.

Continue reading Engineering the future of AI for businesses.

(image)



The state of AI adoption

2017-09-19T20:00:00Z

(image)

AI Conference chairs Ben Lorica and Roger Chen reveal the current AI trends they've observed in industry.

Continue reading The state of AI adoption.

(image)



AI is the new electricity

2017-09-19T20:00:00Z

(image)

Andrew Ng shares his thoughts on where the biggest opportunities in AI may lie.

Continue reading AI is the new electricity.

(image)



Highlights from the Artificial Intelligence Conference in San Francisco 2017

2017-09-19T20:00:00Z

(image)

Watch highlights covering artificial intelligence, machine learning, applied deep learning, and more. From the Artificial Intelligence Conference in San Francisco 2017.

Experts from across the AI world are coming together for the Artificial Intelligence Conference in San Francisco. Below you'll find links to highlights from the event.

The inevitable merger of IQ and EQ in technology

Rana el Kaliouby lays out a vision for an emotion-enabled world of technology.

Continue reading Highlights from the Artificial Intelligence Conference in San Francisco 2017.

(image)



Deep learning to fight cancer: Fireside chat with Peter Norvig and Abu Qader

2017-09-19T20:00:00Z

(image)

Peter Norvig speaks with Abu Qader, the 18-year-old CTO of GliaLab who taught himself machine learning and launched an AI company for breast cancer diagnostics.

Continue reading Deep learning to fight cancer: Fireside chat with Peter Norvig and Abu Qader.

(image)



The inevitable merger of IQ and EQ in technology

2017-09-19T20:00:00Z

(image)

Rana el Kaliouby lays out a vision for an emotion-enabled world of technology.

Continue reading The inevitable merger of IQ and EQ in technology.

(image)



What's in a transport layer?

2017-09-19T10:00:00Z

(image)

Understanding gRPC in the dawn of microservices.

Microservices are small programs, each with a specific and narrow scope, that are glued together to produce what appears from the outside to be one coherent web application. This architectural style is used in contrast with a traditional "monolith" where every component and sub-routine of the application is bundled into one codebase and not separated by a network boundary. In recent years microservices have enjoyed increased popularity, concurrent with (but not necessarily requiring the use of) enabling new technologies such as Amazon Web Services and Docker. In this article, we will take a look at the "what" and "why" of microservices and at gRPC, an open source framework released by Google, which is a tool organizations are increasingly reaching for in their migration towards microservices.

Why Use Microservices?

To understand the general history and structure of microservices emerging as an architectural pattern, this Martin Fowler article is a good and fairly comprehensive read. It's worth noting Fowler's caveat near the end that:

Continue reading What's in a transport layer? .

(image)



Query the planet: Geospatial big data analytics at Uber

2017-09-19T10:00:00Z

A deep dive into Uber's engineering effort to optimize geospatial queries in Presto.From determining the most convenient rider pickup points to predicting the fastest routes, Uber aims to use data-driven analytics to create seamless trip experiences. Within engineering, analytics inform decision-making processes across the board. One of the distinct challenges for Uber is analyzing geospatial big data. City locations, trips, and event information, for instance, provide insights that can improve business decisions and better serve users. Geospatial data analysis is particularly challenging, especially in a big data scenario, such as computing how many rides start at a transit location, how many drivers are crossing state lines, and so on. For these analytical requests, we must achieve efficiency, usability, and scalability in order to meet user needs and business requirements. To accomplish this, we use Presto in our production environment to process the big data powering our interactive SQL engine. In this article, we discuss our engineering effort to optimize geospatial queries in Presto. Using Presto at Uber We chose Presto as our system’s SQL engine because of its scalability, high performance, and smooth integration with Hadoop. These properties make it a good fit for many of our teams. Presto architecture Uber’s Presto ecosystem is made up of a variety of nodes that process data stored in Hadoop. Each Presto cluster has one “coordinator” node that compiles SQL and schedules tasks, as well as a number of “worker” nodes that jointly execute tasks. As detailed in Figure 1, the client sends SQL queries to our Presto coordinator, whose analyzer compiles SQL into an Abstract Syntax Tree (AST). From there, the planner compiles the AST into a query plan, optimizing it for a fragmenter that then segments the plan into tasks. Next, the scheduler assigns each task—either reading files from the Hadoop Distributed File System (HDFS) or conducting aggregations—to a specific worker, and the node manager tracks their progress. Finally, results of these tasks are streamed to the client. Figure 1. Uber’s Presto architecture incorporates one coordinator node that analyzes and schedules tasks and several worker nodes that scan and aggregate data for use by the client. Image courtesy of Zhenxiao Luo. Hadoop infrastructure and analytics Analytic data sets at Uber are captured in our Hadoop warehouse, including event logs replicated by Kafka, service-oriented architecture tables built with MySQL and Postgres, and trip data stored in Schemaless. We run Flink, Pinot, and MemSQL for streaming and real-time analysis of this data. The Hadoop Distributed File System (HDFS) is our data lake. In this ecosystem, event logs and trip data are ingested using Uber internal data ingestion tools, and service-oriented tables are copied to HDFS via Sqoop. With Uber Hoodie, Uber’s incremental updates and inserts library, data is first dumped into our HDFS as nested raw files, and then some of these raw tables are converted into modeled tables via extract, transform, load (ETL) jobs. While batch and ETL jobs run on Hive and Spark, near real-time interactive queries run on Presto. This robust Hadoop infrastructure is integrated a[...]



Four short links: 19 September 2017

2017-09-19T08:00:00Z

BMI, Govt Apps Threatened, Geospatial Jupyter, and W3C Adds DRM to HTML (*spit*).

  1. Brain Machine Interface Isn't SF Any More (Wired) -- the demo is typing without a keyboard, the article is really about the CEO (started Internet Explorer, got a classics degree at 30, then got a PhD in neuroscience).
  2. Is Apple About to Accidentally Kill Government as a Platform? (Jen Pahlka) -- In an effort to reduce the proliferation of spam apps, Apple changed its App Store review guidelines to ban “apps created from a commercialized template or app generation service.” In what appears to be a misguided interpretation of an otherwise reasonable rule, Apple has decided to included white-labeled government apps in this category.
  3. geonotebook -- A Jupyter notebook extension for geospatial visualization and analysis.
  4. World Wide Web Consortium Abandons Consensus, Standardizes DRM, EFF resigns (Cory Doctorow) -- EFF no longer believes that the W3C process is suited to defending the open web.

Continue reading Four short links: 19 September 2017.

(image)



Four short links: 18 September 2017

2017-09-18T08:00:00Z

AI Journos, AI Hype, Faces from Photos, and Regulating Online Advertising

  1. AI-Produced Journalism -- In its first year, the Post has produced around 850 articles using Heliograf. That included 500 articles around the election that generated more than 500,000 clicks — not a ton in the scheme of things, but most of these were stories the Post wasn’t going to dedicate staff to anyway. [...] It’s unclear how that approach can be scaled to cover local communities, where the digital news model has fallen short. Heliograf can be used to digest data like standardized test scores and crime stats; covering a zoning board meeting is another matter. And AI isn’t being used beyond big news organizations, Lewis pointed out. “There’s such a huge gap between the AI haves and have-nots. We are many years away from these things being implemented at the local level.”
  2. Deep Learning Hype in One Picture (Alex Lebrun) -- NIPS conference registrations, 2002 through 2017).
  3. Facial Reconstruction From a Single Photo -- experiment with the code from the paper.
  4. How Did We End Up Here? (John Battelle) -- US regulators are looking at the online ad world, and may align its regulations with those of newspapers (which must attribute political speech, etc.). That has implications for platform immunity, not to mention profits.

Continue reading Four short links: 18 September 2017.

(image)



Four short links: 15 September 2017

2017-09-15T11:10:00Z

Hardware Life Tetris, VR-64, Face Average, and LoRa Backscatter

  1. Tetris From the Ground Up -- quixotic brilliance. Hardware to Game of Life to Tetris.
  2. VR Goggles For C64 -- I built the VR64 using three components: a $10 plastic VR goggle, a $26 LCD, and a cheap power transformer (plus lots of glue gun fun!). I split the screen into two sections, one for the left eye and one for the right. Each section is 19 columns by 25 rows, and the center two rows are not used. Each eye, has 152X200 pixels in high resolution and only 76X200 in multi-color mode! (via Vice)
  3. The Average Face of a UK Member of Parliament -- the idea of a facial mean disconcerts me still.
  4. LoRa Backscatter -- they reverse-engineered the proprietary LoRa physical layer to do this! (Readable article about the tech also available, explaining why this is interesting for IoT)

Continue reading Four short links: 15 September 2017.

(image)



Visualizing convolutional neural networks

2017-09-15T11:05:00Z

Building convnets from scratch with TensorFlow and TensorBoard.Given all of the higher level tools that you can use with TensorFlow, such as tf.contrib.learn and Keras, one can very easily build a convolutional neural network with a very small amount of code. But often with these higher level applications, you cannot access the little inbetween bits of the code, and some of the understanding of what’s happening under the surface is lost. In this tutorial, I’ll walk you through how to build a convolutional neural network from scratch, using just the low-level TensorFlow and visualizing our graph and network performance using TensorBoard. If you don't understand some of the basics of a fully connected neural network, I highly recommend you first check out Not another MNIST tutorial with TensorFlow. Throughout this article, I will also break down each step of the convolutional neural network to its absolute basics so you can fully understand what is happening in each step of the graph. By building this model from scratch, you can easily visualize different aspects of the graph so that you can see each layer of convolutions and use them to make your own inferences. I will only highlight major aspects of the code, so if you would like to follow this code step-by-step, you can checkout the corresponding Jupyter Notebook on GitHub. Gathering a data set Getting started, I had to decide which image data set to use. I decided to use the University of Oxford, Visual Geometry Group’s pet data set. I chose this data set for a few reasons: it is very simple and well-labeled, it has a decent amount of training data, and it also has bounding boxes—to utilize if I want to train a detection model down the road. Another data set I thought would be excellent for a building a first model was the Simpsons data set found on Kaggle, which has a great amount of simple data on which to train. Choosing a model Next, I had to decide on the model of my convolutional neural network. Some very popular models are GoogLeNet or VGG16, which both have multiple convolutions designed to detect images from the 1000 class data set imagenet. I decided on a much simpler four convolutional network: Figure 1. Image courtesy of Justin Francis. To break down this model, it starts with a 224x224x3 image, which is is convolved to 32 feature maps, based from the previous three channels. We than convolve this group of 32 feature maps together into another 32 features. This is then pooled into a 112x112x32 image, which we convolve into 64 feature maps twice followed, with a final pooling of 56x56x64. Each unit of this final pooled layer is then fully connected to 512 neurons, and then finally put through a softmax layer based upon the number of classes. Processing and building a data set First, let’s get started with loading our dependencies, which includes a group of helper functions I made called imFunctions for processing the image data. import imFunctions as imf import tensorflow as tf import scipy.ndimage from scipy.misc import imsave import matplotlib.pyplot as plt import numpy as np We can then download and extract the images using imFunctions. imf.downloadImages('annotations.tar[...]



Load, search, and secure data in multiple formats

2017-09-15T11:00:00Z

The O'Reilly Podcast: Dave Cassel on building a unified enterprise database to store and query any type of data.In this podcast episode, I speak with Dave Cassel, technical community manager at MarkLogic, creator of a multi-model NoSQL database that aims to integrate data silos for a unified view. We talked about integration patterns for loading and exporting data at ease, an architecture that enables efficient search and queries, and layers of security that follow the data from its original source throughout its lifecycle. src="https://w.soundcloud.com/player/?url=https%3A//api.soundcloud.com/tracks/342361502&auto_play=false&hide_related=false&show_artwork=true" height="166" width="100%" frameborder="no" scrolling="no"> Work on applications, as soon as you load the data The idea of 'load as-is' is that your data already exists in some form, and that form can vary dramatically. It can be word documents or XML or JSON data. It can also be stuff that you've already got in relational databases. The idea here is that if we can take that data in whatever form it currently exists and bring it into the database in that form, then we can start exploring it in the context of that database—rather than having to first build up some schema, some representation of it, do a bunch of ETL work, and only then be able to start working with it. What that means is as soon as we get the data into the database, we can start actually working on our applications. The application is what actually delivers value—business value—to customers. By getting to work on that faster and being able to iterate on that, we've found we've got a much better time to value, and our customers have told us that repeatedly. Ask questions of your data, without a complex architecture Let's think about a common architecture. You've got a three-tier application with a user interface, an application layer that holds your business logic, and a database. Most of the time, with that approach, you need to add on a separate search engine, and that's how you're going to search the text part of your data. That means—if we think about that application layer in the middle—the source code there is going to have to go to different places to query data or to search text. Then, it's going to have to take those results and put them together, and synthesize them in some good way before presenting them to the user. When you've got the search engine built into the database, the application layer has one place to go, and that really simplifies the code you have to write. The application layer itself becomes a lot simpler. If we don't have that, what we end up with is complexity. Complexity usually leads to two things: longer release cycles and more bugs. By having the search engine built in, it allows you a single place to go for your information, it simplifies that application layer, and makes your application more reliable. Implementing role-based security With a role-based security model, it means that each user will have one or more roles assigned. Those roles determine what that person's allowed to see, what t[...]



Why might an off-the-shelf anomaly detection technique not work in practice?

2017-09-14T11:00:00Z

(image)

See examples of the many traps you can fall into if you use off-the-shelf anomaly detection techniques.

Continue reading Why might an off-the-shelf anomaly detection technique not work in practice?.

(image)



What are the challenges in building an anomaly detection system?

2017-09-14T11:00:00Z

(image)

Learn about some of the common issues you will encounter when developing algorithms for a modern anomaly detection system.

Continue reading What are the challenges in building an anomaly detection system?.

(image)



What are the challenges in building an anomaly detection system for streaming and live data?

2017-09-14T11:00:00Z

(image)

Learn the difference between live and streaming anomaly detection systems and how to address the challenges different data velocities pose.

Continue reading What are the challenges in building an anomaly detection system for streaming and live data?.

(image)



The state of machine learning in Apache Spark

2017-09-14T10:49:00Z

(image)

The O’Reilly Data Show Podcast: Ion Stoica and Matei Zaharia explore the rich ecosystem of analytic tools around Apache Spark.

In this episode of the Data Show, we look back to a recent conversation I had at the Spark Summit in San Francisco with Ion Stoica (UC Berkeley professor and executive chairman of Databricks) and Matei Zaharia (assistant professor at Stanford and chief technologist of Databricks). Stoica and Zaharia were core members of UC Berkeley’s AMPLab, which originated Apache Spark, Apache Mesos, and Alluxio.

Continue reading The state of machine learning in Apache Spark.

(image)



Self-driving trucks enter the fast lane using deep learning

2017-09-14T10:00:00Z

A deep dive into startup TuSimple’s use of Apache MXNet.This past June, a driverless truck passed a 200-mile test drive from Yuma, Arizona, to San Diego, California—a milestone for autonomous trucking in the U.S. This feat was achieved by the company TuSimple, which trained its driving system using an AI technique known as deep learning to simulate tens of millions of miles of road driving. Deep learning can approach tasks that are easy for a person but hard for computers, such as: identifying people and objects in photos; detecting mood and intention in an image, text, or voice interaction; or recognizing handwriting. Rather than hand-coding software routines with specific instructions, the system is trained using large amounts of data and algorithms that give it the ability to learn how to perform the task. With deep learning, we are now able to build software that can detect fraud; identify patterns in trading; recommend new products to customers; and, in the very near future, provide autonomous trucks to an industry plagued with a chronic shortage of drivers, fatal accidents, and high overhead cost due to fuel inefficiencies. TuSimple’s driverless trucks are poised to disrupt the $700 billion U.S. trucking industry—just one thin slice of how advances in deep learning and AI will transform our lives. “For me, it was the challenge of working on unsolved problems that drew me to this project,” says Xiadoi Hou, co-founder and CTO of TuSimple. So how is TuSimple using deep learning to make self-driving trucks possible? “We use cameras as our primary sensors,” Hou explains. “Each camera ingests 20-30 frames per second, about 100 megabytes of data, which passes through layers and layers of deep learning network stacks and gets some result. The results are further combined in algorithms to make real-time decisions based on the truck’s self-information, velocity, and angle, where the other cars or obstacles are, detecting lanes, etc.” TuSimple’s deep learning requirements for training versus implementation are very different. The models are created and trained in a multiple-GPU-based Amazon Web Services cloud environment. Once the models are trained, they are transferred to truckborne computers where the models interpret sensor input into results that enable real-time understanding and reaction to road conditions and other road users. In 2015, when Hou started thinking seriously about adopting a deep learning framework to speed development, Apache MXNet was still a fledgling project known as CXXNET. Hou had jotted down his requirements for a framework, which included specific constraints on time, cost, capacity, and scalability. MXNet’s breakthrough, and what attracted Hou and his team, is that it successfully combined the ability to scale to multiple GPUs across multiple hosts with high performance and cross-platform portability. MXNet was also efficient in training—offsetting the cost of computational power needed to train models. “The choice of a deep learning framework[...]



Four short links: 14 September 2017

2017-09-14T10:00:00Z

Self-Folding Electronics, Mozilla Comments, Observability, and NLP Library

  1. 3D-Printed Self-Folding Electronics -- Here, we demonstrate a method for spontaneous folding of three-dimensional (3D)-printed composites with embedded electronics at room temperature. The composite is printed using a multimaterial 3D-printing process with no external processing steps. Upon peeling from the print platform, the composite self-shapes itself using the residual forces resulting from polymer swelling during the layer-by-layer fabrication process. As a specific example, electrochromic elements are printed within the composite and can be electrically controlled through its folded legs.
  2. WaPo Deploys Mozilla's Comments System -- Mozilla has a whole group working on better tools for digital journalism.
  3. A Field Guide to Observability - when you don't have observability, you can't tell the difference between normal and abnormal. Nice.
  4. Allen NLP -- an open source NLP research library, built on PyTorch.

Continue reading Four short links: 14 September 2017.

(image)



Chris Wysopal on a shared responsibility model for developers and defenders

2017-09-13T17:00:00Z

(image)

The O’Reilly Security Podcast: Shifting secure code responsibility to developers, building secure software quickly, and the importance of changing processes.

In this episode of the Security Podcast, I talk with Chris Wysopal, co-founder and CTO of Veracode. We discuss the increasing role of developers in building secure software, maintaining development speed while injecting security testing, and helping developers identify when they need to contact the security team for help.

Continue reading Chris Wysopal on a shared responsibility model for developers and defenders.

(image)



Time machine for cancer diagnosis

2017-09-13T16:20:00Z

Exciting new genetic testing technology has improved the speed and accuracy of cancer diagnosis. One of the most important improvements in oncology was the introduction of target therapy. It allowed clinicians to prescribe a pharmacological treatment specifically programmed to fight and kill only cancer cells, unlike wide-spectrum chemotherapy (Slamon D.J. et al. 2001). To define a patient as eligible to receive target therapy, it is necessary to define the genetic profile of the cancer cells. Among the techniques that have been developed in the past 20–30 years, one of the most used and considered the gold standard is in situ hybridization. This technique is based on the principle of the specificity of the DNA sequences and uses genetic probes to recognize a specific gene, chromosome, or part of them. This enables labs to see if there are numerical alterations, such as multiple copies of the genes; a reduction in the genes’ copies; or structural alterations such as deletions, inversions, or rearrangements; and then, to emit a report. For example, in breast cancer it is important to define the number of the HER2 gene to assess the eligibility for the dedicated therapy; or in brain cancer where the loss of some parts of chromosome 1 and chromosome 19 is related to a specific cancer type (Slamon D.J. et al 2001; Barbashina V. et al 2005). Today it is common to label these DNA probes with fluorescent dyes. Figure 1-1 shows one example of fluorescent in situ hybridization (FISH): the red and green dots are respectively two different genes. The blue big bodies are the cell nuclei. Figure 1-1. FISH test: red indicates the test gene; green, the control gene; blue bodies, the cell nuclei Following the last WHO guidelines for cancer diagnosis, the molecular characterization of an individual’s cancer cells is becoming mandatory, and it is fundamental to do a complete and accurate diagnosis. The main problem is that these tests require at least two or three working days and are costly. The wait reduces the lab’s capability to quickly emit a genetic report and also reduces the number of tests that could be performed daily. The delay in diagnosis is not only a problem for the hospital administration in planning patients’ followup but is also related to an increasing amount of anxiety disorders among cancer patients (Baqutayan SMS 2012). This limit of the FISH and the general ISH techniques started to become relevant once the new therapeutic targeted drugs for specific genetic assets were enlisted among the first-line treatments being prescribed to patients after their first biopsy. Some examples of cancer treatment based on the ISH techniques results include the use of an antibody called Trastuzumab for the breast cancers with the amplification of the HER2 gene, or another antibody called Crizotinib, specific for a lung cancer type characterized by ALK gene genetic alterations (Voeg[...]



Four short links: 13 September 2017

2017-09-13T12:30:00Z

Traffic Interception, AI Security, Security Must-Knows, and Learning Game Engines

  1. Understanding Web Traffic Interception (CloudFlare) -- We found that between 4% and 10% of the web’s encrypted traffic (HTTPS) is intercepted.
  2. Awesome AI Security -- curated list of AI security resources.
  3. What Every Software Engineer Should Know About Search -- the key to success in search is building processes for evaluation and tuning into the product and development cycles. A search system architect should think about processes and metrics, not just technologies.
  4. Game Engine Learning from Video -- trained on a speedrunner video, uses 2m of footage of the game being played to build its own game engine. Started with Mega Man and Sonic, now using Super Mario Bros. See also the university's press release.

Continue reading Four short links: 13 September 2017.

(image)



Data and design are tools that, together, build great experiences for your users

2017-09-13T11:15:00Z

Data capture, management, and analysis builds a bridge between design, user experience, and business relevance.Data. This short word has captured the imagination of the media. Every day, another news story breaks that preaches the power of “big data,” discussing the value of data for business, data for adaptive technology experiences, or data and marketing. It’s clear that regardless of the application, data is a very hot topic and the currency of the day. It might feel like using data is big news now, but the truth is that we’ve been using data for a long time in the internet business. Data in the form of digital content and activity traces is at the core of internet experiences. For the past 20 years, we’ve been inventing new digital experiences and re-creating physical world experiences in the digital world. Sharing photos, having conversations, finding love: activities that we perform in our daily lives have all become digital. Being digital means we can log and track these activities with ease. Digital interfaces have made data collection so easy that now our biggest challenge is not access to data; it’s avoiding the false belief that data is always good, and recognizing that interpreting the data and deriving meaning from it is itself a challenging task. In other words, the ease of gathering data can lead us to be lazy in our thinking, resulting in erroneous conclusions if the data quality is low or unrepresentative or the data analysis is flawed. There’s more potential here than collecting any and all data, of course. The “digital revolution” and the internet as a platform mean we can also run experiments to collect data that allow us to compare one experience to another. We have the potential to run many experiments, sometimes concurrently with many users at once—a practice that has been called “experimentation at scale.” And that leads us to why the three of us wanted to write this book. We had two key reasons. First, so more people with a user-centric orientation enter into the conversation about data collection, data quality, and data interpretation. And second, so those who wish to can apply the information we share here and more effectively leverage data in their design work. We hope you are able to use the information in this book to your benefit and that this will, in turn, further the practice of bringing data and design closer together. Beneath the complex and ever-evolving world of experimental design and statistical analysis, there are some basic principles that are surprisingly powerful and very important to understand. Our aim is to give you a framework for thinking critically and carefully about the design of experiments, and to help you avoid the trap of just being excited about data for data’s sake. We want you to be excited about collecting and analyzing the right data, in the right way, with the right framework so[...]



How do use cases benefit from real-time processing?

2017-09-13T11:00:00Z

(image)

Learn some of the benefits of using real-time processing of data for some use cases.

Continue reading How do use cases benefit from real-time processing?.

(image)



What does dysfunction look like on a data team?

2017-09-13T11:00:00Z

(image)

Learn to identify problems that may indicate data team dysfunction.

Continue reading What does dysfunction look like on a data team?.

(image)



Fast track Apache Spark

2017-09-12T11:00:00Z

6 lessons learned to get a quick start on productivity.My upcoming Strata Data NYC 2017 talk about big data analysis of futures trades is based on research done under the limited funding conditions of academia. This meant that I did not have an infrastructure team, therefore I had to set up a Spark environment myself. I was analyzing futures order books from the Chicago Mercantile Exchange (CME) spanning May 2, 2016, to November 18, 2016. The CME data included extended hours trading with the following fields: instrument name, maturity, date, time stamp, price, and quantity. Futures were comprised of 21 financial instruments spanning six markets—foreign exchange, metal, energy, index, bond, and agriculture. Trades were recorded roughly every half second. In the process of doing this research, I learned a lot of lessons. I want to help you avoid making the mistakes I did so you can start making an immediate impact in your organization with Spark. Here are the six lessons I learned: You don’t need a database or data warehouse. It is common for Spark setups to use Apache Hadoop’s distributed file system (HDFS) and Hive for querying, but you can use text files and other accepted file formats in local directories if you don’t want to go through the hassle of setting up a database or warehouse. When I worked at Sprint, they had an on-premise cluster setup and stored data in HDFS, which provided me great exposure to learn new technologies. However, in my research with the CME, I had already been reading CSVs directly into Spark using the spark-csv package, which has since been merged into the main Spark project because of the fundamental capability the package provides. You don’t need a cluster of machines. You can hit the ground running using your local machine or a single server. This is also helpful in that you do not have to consider what cluster manager to install—YARN or Mesos. You can simply use the standalone cluster manager that comes with Spark. Just make sure that, if you use one machine, it has multiple cores and enough memory to cache your data. In general, any time you build a distributed system you should first start with one machine. One of my mentors once told me that software engineering is like math. To build something, it is often useful to start at n=0, as in an inductive proof, and then generalize from there. That is one analogy that I can relate to! Use a notebook. Don’t bother trying to configure an IDE or using the shell to write applications. Of course, using the shell is great if you are submitting an application or doing some basic coding. I used the shell when I was learning Spark to run some of the examples that come with Spark and follow along with tutorials. And, in theory, having all the features of an IDE might be a reflexive thin[...]



How do I use layers in Figma to manage the complexity of a document?

2017-09-12T11:00:00Z

(image)

Learn about Figma's layers area – where you control individual objects, images, and text – and avoid making overly complex Figma documents.

Continue reading How do I use layers in Figma to manage the complexity of a document?.

(image)



What are the best formats to use with Figma and how do I import them?

2017-09-12T11:00:00Z

(image)

Learn about the four major image and object formats accepted by Figma - and three ways to import them.

Continue reading What are the best formats to use with Figma and how do I import them?.

(image)



How do I define my workspace in Figma?

2017-09-12T11:00:00Z

(image)

Learn about Figma's Frame tool — then use it to define the correct workspace for the prototype of your website, tablet app, or smart phone app.

Continue reading How do I define my workspace in Figma?.

(image)



Principles of globally distributed systems

2017-09-12T10:00:00Z

(image)

Understand how distributed systems work and how to use them.

Continue reading Principles of globally distributed systems.

(image)



Four short links: 12 September 2017

2017-09-12T09:50:00Z

Open Source Guides, Music Generation, Modern ISP Tech, and Interactive Web Narrative Tool

  1. TODO Group Open Source Guides -- a set of living guides to help you learn more about setting up an open source program.
  2. Deep Learning Techniques for Music Generation -- This book is a survey and an analysis of different ways of using deep learning (deep artificial neural networks) to generate musical content.
  3. Building an ISP in 2017 -- and this is how some new ISPs look on the inside. VERY different than what they were building a decade ago, thanks to cloud, kit, APIs, and modern deployment tools.
  4. Idyll -- a tool that makes it easier to author interactive narratives for the web. The goal of the project is to provide a friendly markup language—and an associated toolchain—that can be used to create dynamic, text-driven web pages.

Continue reading Four short links: 12 September 2017.

(image)



The APIs for neural networks in TensorFlow

2017-09-11T11:10:00Z

A look at the Layer API, TFLearn, and Keras.TensorFlow has gathered quite a bit of attention as the new hot toolkit for building neural networks. To the beginner, it may seem the only thing that rivals this interest is the number of different APIs that you can use. In this article, we go over a few of them, building the same neural network each time. We start with low-level TensorFlow math, and then show how to simplify that code with TensorFlow's layer API. We also discuss two libraries built on top of TensorFlow: TFLearn and Keras. The MNIST database is a collection of handwritten digits. Each is recorded in a 28x28 pixel grayscale image. We build a two-layer perceptron network to classify each image as a digit from zero to nine. The first layer will fully connect the 784 inputs to 64 hidden neurons, using a sigmoid activation. The second layer will connect those hidden neurons to 10 outputs, scaled with the softmax function. The network will be trained with stochastic gradient descent, on minibatches of 64, for 20 epochs. (These values are chosen not because they are the best, but because they produce reasonable results in a reasonable time.) We'll start by loading the modules and the data, as well as setting up some constants we'll use repeatedly. import numpy as np import tensorflow as tf from tensorflow.examples.tutorials.mnist import input_data mnist = input_data.read_data_sets('/tmp/data', one_hot=True) Xtrain = mnist.train.images ytrain = mnist.train.labels Xtest = mnist.test.images ytest = mnist.test.labels N_PIXELS = 28 * 28 N_CLASSES = 10 HIDDEN_SIZE = 64 EPOCHS = 20 BATCH_SIZE = 64 sess = tf.Session() Raw TensorFlow At its heart, TensorFlow is just a tool for assembling and evaluating computational graphs. Thus, the most basic way to use TensorFlow is to set up the calculation by hand. Let's start by setting up placeholders for the features and labels. These record the shape and datatype of that data to be fed in. Note that the first dimension has size None, which indicates that it can take an arbitrary number of observations. x = tf.placeholder(tf.float32, [None, N_PIXELS], name="pixels") y_label = tf.placeholder(tf.float32, [None, N_CLASSES], name="labels") In the first layer, the input features (pixel intensities) are multiplied by a weight matrix of size N_PIXELS x HIDDEN_SIZE. The weights are stored in a variable, which is a TensorFlow data structure that holds state that can be updated during the training. A bias term is added to this, and the result is sent through a sigmoid activation function. W1 = tf.Variable(tf.truncated_normal([N_PIXELS, HIDDEN_SIZE],[...]



Finding meaning in generative adversarial networks

2017-09-11T11:00:00Z

Artificial intelligence is emerging as a creative force; in the process, it reveals something of itself.If you ask a child to draw a cat, you’ll learn more about the child than you will about cats. In the same way, asking neural networks to generate images helps us see how they reason about the information they’re given. It’s often difficult to interpret neural networks—that is, to relate their functioning to human intuition—and generative algorithms offer a way to make neural nets explain themselves. Neural networks are most commonly implemented as classifiers—models that are able to distinguish, say, an image of a cat from an image of a dog, or a stop sign from a fire hydrant. But over the last three years, researchers have made astonishing progress in essentially reversing these networks. Through a handful of generative techniques, it’s possible to feed a lot of images into a neural network and then ask for a brand-new image that resembles the ones it’s been shown. Generative AI has turned out to be remarkably good at imitating human creativity at superficial levels. The current wave of generative AI research builds on the generative adversarial network, or GAN, a neural network structure introduced by Ian Goodfellow and his collaborators in 2014. A flowering of inventive applications followed their paper. Researchers have generated images of everything from faces to bedrooms. Through a GAN-based technique called pix2pix, satellite images become maps, black-and-white photos become colorized, and simple sketches become realistic renderings. Enhancing blurry, low-resolution images—a much-mocked fantasy in police procedurals—has become a reality through GANs, which are able to make sophisticated assumptions about likely structure in photographs. Figure 1. Fictional album covers created by a generative adversarial network. Credit: Alec Radford, Luke Metz, Soumith Chintala; copyright 2015 Alec Radford, used under the MIT License. A generative adversarial network consists of two neural networks: a generator that learns to produce some kind of data (such as images) and a discriminator that learns to distinguish “fake” data created by the generator from “real” data samples (such as photos taken in the real world). The generator and the discriminator have opposing training objectives: the discriminator’s goal is to accurately classify real and fake data; the generator’s goal is to produce fake data the discriminator can’t distinguish from real data. In our new Oriole interactive tutorial, Adit Deshpande and I use TensorFlow to demonstrate a very simpl[...]



Four short links: 11 September 2017

2017-09-11T08:50:00Z

Criminal Smart Contracts, Economic Damaged Goods, Crypto Bar, and Equifax Advice Investigating the Future of Criminal Smart Contracts (A Paper a Day) -- readable and interesting summary of a paper on using smart contracts around criminal activities: leaking stolen confidential information, buying stolen keys, and assassination. Damaged Goods -- this is how economics refers to intentionally crippled offerings (e.g., a student version that only has half the features of the pro version, when the code for the pro features has already been written and the marginal cost to the vendor for including those features is 0). The provoking example is Tesla (60kWh and 75kWh models have the same batteries, the lower range is software-imposed based on how much you've paid for your car). Without selling to the high willingness-to-pay customers at the high price, the good might not be produced at all because the profit from customers who are only willing to buy at a discount aren’t enough to support the R&D. Thus, the high willingness-to-pay customers aren’t worse off from the existence of a discounted version and the low willingness to pay customers and the firm are clearly better off. See also Tesla offers $20K in software-upgradable options when you buy a car and Cory's take. The Bletchley -- Use WW2 Enigma machines and protocols, and Sherlock's deduction principles to create personalized cocktail recipes. The recipes will be ciphered and handed over to you with your drinks, for these are to stay a secret between and you and our agents. Neat video in this tweet. Credit Report/Identity Theft Advice -- in the wake of the Equifax breach, some solid steps to take in the scenarios that might play out with your identity. In particular, the problem of someone else opening a bank account with your identity (the bank won't close it, as you're not the person who opened it, and therein lies most of the pain). Continue reading Four short links: 11 September 2017.[...]



From Infinity to 8: Translating AI into real numbers

2017-09-08T17:45:00Z

Turning abstract AI into real business solutions.Like infinity, artificial intelligence is an abstract concept. AI commercials show floating orbs and a sprinkling of fairy dust providing magical answers to our questions—even those we didn’t know to ask. These presentations of AI remind me of an episode from South Park’s second season called “Underpants Gnomes.” In this episode, gnomes collect underpants and make a profit. The question is, how exactly do they get from point A to point B? The business plan is revealed via a slide, of course: Collect underpants ? Profit AI offers something similar: (1) Collect data, (2) AI, (3) Profit! My goal in this article is to help you be more explicit about Step 2. I hope it helps you make real the incredible AI opportunities that I know are available to your organization. The first step in getting real with AI is to define it: AI is just maths. Just like there isn’t only one kind of math, there isn’t one AI. Before you run away because of the “m” word, have faith: just as you don’t need to know how to code to influence software design successfully, you don’t need to know math to influence the AI in your desired solution. Focus on the inputs and the outputs, and how you can validate each. In this post, I’ll cover the input and output validation tips, as well as how to make sure what’s in between focuses on the problem you need solved. Putting the wheel before the cart before the horse Every day, companies approach my company, Nara Logics, saying AI is one of their top objectives this year, and asking if our platform can help. Scott Cook, founder of Intuit, says, “Success is not delivering a feature; success is learning how to solve a customer’s problem.” In this analogy, AI success is two clicks away. Essentially, AI helps you deliver a feature. To find a problem that is a good fit for AI, I recommend focusing on the “Four Vs” of big data—i.e., the input: What data do you have large Volumes of? What actions does it impact now? And what ones could it impact? For consumer companies, for example, customer transactions are large volumes of data. For a manufacturing company, factory production information provides volume. Where is there significant Variety in your data? For example, an insurance company offering a few hundred policies doesn’t have volume. However, the variety of customer attributes driving the match to a policy is big data. Do you have areas of high Velocity data? IoT is certainly driving this aspect of data for many companies.[...]



Why strong sound design is critical to successful products

2017-09-08T13:20:00Z

Sound design should not be an afterthought at the end of a design process.A brief survey of sound design Given its broad range of uses, and central role in the formation of our culture and intellectual traditions, is it any wonder that we eventually bent ourselves to the task of turning sound into a full-fledged tool of communication and influence? The history of human civilization is also a history of increasingly complex sound design. How we got there offers some crucial insights into what we’ve come to expect from sound, and how to meet or confound those expectations. Alarms Throughout human history we have devised alarms that alert us to danger, or convey some other kind of status, over a greater physical range than could be achieved through visual signalling. Initially this was something as simple as a stick hitting a log, but as humans gathered into larger groups and more permanent settlements, we refined them to be louder, more distinctive and more customized. It was no longer enough to let 40 people know that a raiding party was coming—now you had to let 2,000 people know that a house was on fire, floodwaters were rising, or a thief was about. The story of alarms is the story of civilization, to the point where we hear dozens a day, almost without realizing it, from car horns and police sirens to school bells and smartphone alerts. Musical instruments The earliest musical instruments found are flutes, over 40,000 years old, meaning they predate written language by many thousands of years. We were shaping each other’s emotions with sound long before we learned to do so with text. As best we can tell, early musical instruments were created for use in ritual and were played in communal settings, and in many ways they’re still used this way in modern times. Today, your daily ritual might incorporate a specific upbeat song into a wakeup or workout routine. Romance and relationships may be associated with particular songs. Outdoor concerts are spaces for coming of age as well as showing off and socializing. They are also places for the establishment and communication of tribal signatures such as fashion, identity, beauty and mating readiness. A popular, catchy summer song (Daft Punk’s Get Lucky comes to mind for 2014, or George Michael’s Faith for 1988) may define an entire summer, not just in one country, but around the world. Together these songs represent rituals of summer or certain moods. Digital recording and playback marks an advancement over analog instruments in tha[...]



Four short links: 8 September 2017

2017-09-08T13:05:00Z

CryptoCurrency Fails, AI Interchange, Big Data Surveillance, and Foragers vs. Farmers

  1. Cryptographic Vulnerabilities in IOTA -- The cryptocurrency space is heating up—Protocol Labs raised $200M for Filecoin, Bancor raised $150M, and Tezos raised $232M. [...] [T]he due diligence required to make sound investments in the technology isn’t keeping up with the pace of the hype. Don't. Roll. Your. Own. Crypto!
  2. Microsoft and Facebook Launch AI Interoperability -- Open Neural Network Exchange (ONNX) format, a standard for representing deep learning models that enables models to be transferred between frameworks.
  3. Big Data Surveillance: The Case of Policing -- based on observations and interviews with the Los Angeles Police Department, the author finds: First, discretionary assessments of risk are supplemented and quantified using risk scores. Second, data are used for predictive, rather than reactive or explanatory, purposes. Third, the proliferation of automatic alert systems makes it possible to systematically surveil an unprecedentedly large number of people. Fourth, the threshold for inclusion in law enforcement databases is lower, now including individuals who have not had direct police contact. Fifth, previously separate data systems are merged, facilitating the spread of surveillance into a wide range of institutions.
  4. Forager vs. Farmer (Robin Hanson) -- a safe, playful, talky collective isn’t always the best way to deal with things. I think Robin Hanson is saying it's OK to punch Nazis but not OK to punch a coworker—EVEN IF THEY SUGGEST WRITING THE NEW SYSTEM IN {some language that has recently been mentioned on Hacker News}.

Continue reading Four short links: 8 September 2017.

(image)



Ken Kousen on Java, Spring, and Groovy

2017-09-07T14:25:00Z

(image)

The O’Reilly Programming Podcast: A look at what’s new in Java 9 and Spring 5.

In this episode of the O’Reilly Programming Podcast, I talk with Ken Kousen, an author, instructor, and consultant who is presenting the live online training courses Functional Programming in Java 8 and Getting Started with Spring Boot in September and October. He is also the author of the newly published O’Reilly book Modern Java Recipes: Simple Solutions to Difficult Problems in Java 8 and 9.

Continue reading Ken Kousen on Java, Spring, and Groovy.

(image)



How do I build a Jenkins project using the Ant build tool?

2017-09-07T11:00:00Z

(image)

Learn how to automate Jenkins continuous integration projects with Apache Ant, a popular build tool for developing software.

Continue reading How do I build a Jenkins project using the Ant build tool?.

(image)