Language: English
Tags:
approach  arxiv  based  data  learning  method  model  models  network  paper  problem  proposed  results  show  system  time
Rate this Feed

Feed Details and Statistics

## Computer Science (cs) updates on the arXiv.org e-print archive

Published: 2018-04-25T20:30:00-05:00

Lagrangian formulation for mixed traffic flow including two-wheelers. (arXiv:1804.09176v1 [math.OC])

Lagrangian formulation of kinematic wave provides a more accurate representation than the most commonly used Eulerian formulation. Furthermore, Lagrangian representation offers a flexibility to study certain traffic phenomena (e.g. capacity drop, traffic delay, trajectory). The natural resemblance of Lagrangian representation to road traffic data collection methods (probe vehicles) also renders it suitable for applications such as traffic motoring. This paper presents a multi-class Lagrangian representation for a traffic flow consisting of cars and two-wheelers. A numerical method for solving the continuity equation is presented. We compare the results obtained with Eulerian and Lagrangian formulations.

MaskFusion: Real-Time Recognition, Tracking and Reconstruction of Multiple Moving Objects. (arXiv:1804.09194v1 [cs.CV])

We present MaskFusion, a real-time, object-aware, semantic and dynamic RGB-D SLAM system that goes beyond traditional systems that output a geometry-only map -- MaskFusion recognizes, segments and assigns semantic class labels to different objects in the scene, while tracking and reconstructing them even when they move independently from the camera.

As an RGB-D camera scans a cluttered scene, image-based instance-level semantic segmentation creates semantic object masks that enable real-time object recognition and the creation of an object-level representation for the world map. Unlike previous recognition-based SLAM systems, MaskFusion does not require prior knowledge or known models of the objects it can recognize and can deal with multiple independent motions. Unlike recent semantics enabled SLAM systems that perform voxel-level semantic segmentation MaskFusion takes full advantage of using instance-level semantic segmentation to enable semantic labels to be fused into an object-aware map. We show augmented-reality applications, that demonstrate the unique features of the map output by MaskFusion: instance-aware, semantic and dynamic.

The quantitative measure and statistical distribution of fame. (arXiv:1804.09196v1 [cs.SI])

Fame and celebrity play an ever-increasing role in our culture. However, despite the cultural and economic importance of fame and its gradations, there exists no consensus method for quantifying the fame of an individual, or of comparing that of two individuals. We argue that, even if fame is difficult to measure with precision, one may develop useful metrics for fame that correlate well with intuition and that remain reasonably stable over time. Using datasets of recently deceased individuals who were highly renowned, we have evaluated several internet-based methods for quantifying fame. We find that some widely-used internet-derived metrics, such as search engine results, correlate poorly with human subject judgments of fame. However other metrics exist that agree well with human judgments and appear to offer workable, easily accessible measures of fame. Using such a metric we perform a preliminary investigation of the statistical distribution of fame, which has some of the power law character seen in other natural and social phenomena such as landslides and market crashes. In order to demonstrate how such findings can generate quantitative insight into celebrity culture, we assess some folk ideas regarding the frequency distribution and apparent clustering of celebrity deaths.

Resource Allocation and HARQ Optimization for URLLC Traffic in 5G Wireless Networks. (arXiv:1804.09201v1 [cs.NI])

5G wireless networks are expected to support Ultra Reliable Low Latency Communications (URLLC) traffic which requires very low packet delays ( < 1 msec.) and extremely high reliability ($\sim$99.999\%). In this paper we focus on the design of a wireless system supporting downlink URLLC traffic. Using a queuing network based model for the wireless system we characterize the effect of various design choices on the maximum URLLC load it can support, including: 1) system parameters such as the bandwidth, link SINR , and QoS requirements; 2) resource allocation schemes in Orthogonal Frequency Division Multiple Access (OFDMA) based systems; and 3) Hybrid Automatic Repeat Request (HARQ) schemes. Key contributions of this paper which are of practical interest are: 1) study of how the the minimum required system bandwidth to support a given URLLC load scales with associated QoS constraints; 2) characterization of optimal OFDMA resource allocation schemes which maximize the admissible URLLC load; and 3) optimization of a repetition code based HARQ scheme which approximates Chase HARQ combining.

Vocal melody extraction using patch-based CNN. (arXiv:1804.09202v1 [cs.SD])

A patch-based convolutional neural network (CNN) model presented in this paper for vocal melody extraction in polyphonic music is inspired from object detection in image processing. The input of the model is a novel time-frequency representation which enhances the pitch contours and suppresses the harmonic components of a signal. This succinct data representation and the patch-based CNN model enable an efficient training process with limited labeled data. Experiments on various datasets show excellent speed and competitive accuracy comparing to other deep learning approaches.

Automated Mouse Organ Segmentation: A Deep Learning Based Solution. (arXiv:1804.09205v1 [cs.CV])

The analysis of animal cross section images, such as cross sections of laboratory mice, is critical in assessing the effect of experimental drugs such as the biodistribution of candidate compounds in preclinical drug development stage. Tissue distribution of radiolabeled candidate therapeutic compounds can be quantified using techniques like Quantitative Whole-Body Autoradiography (QWBA).QWBA relies, among other aspects, on the accurate segmentation or identification of key organs of interest in the animal cross section image such as the brain, spine, heart, liver and others. We present a deep learning based organ segmentation solution to this problem, using which we can achieve automated organ segmentation with high precision (dice coefficient in the 0.83-0.95 range depending on organ) for the key organs of interest.

Secrecy Analysis of Physical Layer over $\kappa-\mu$ Shadowed Fading Scenarios. (arXiv:1804.09208v1 [cs.IT])

In this paper, the secrecy analysis of physical layer when both the main and wiretap channels undergo $\kappa-\mu$ shadowed fading channel is investigated. In particular, the average secrecy capacity (ASC), secure outage probability (SOP), the lower bound of SOP (SOP$^L$), and the probability of strictly positive secrecy capacity (SPSC) are derived by using the classic Wyner's wiretap model. Two different scenarios for the fading parameters, i.e., $\mu$ and $m$ which represents the shadowing impact have been studied. These parameters are chosen first as arbitrary numbers, thus the performance metrics are expressed in single infinite series with multivariate Meijer $G$-function. In the second scenario, both the aforementioned fading parameters are assumed to be integer numbers in order to obtain the derived results in simple exact closed-form analytic mathematically tractable expressions. The numerical results of this analysis are verified via Monte Carlo simulations.

On the construction of sparse matrices from expander graphs. (arXiv:1804.09212v1 [cs.IT])

We revisit the asymptotic analysis of probabilistic construction of adjacency matrices of expander graphs proposed in \cite{bah2013vanishingly}. With better bounds we derived a new reduced sample complexity for the number of nonzeros per column of these matrices, precisely $d = \mathcal{O}\left(\log_s(N/s) \right)$; as opposed to the standard $d = \mathcal{O}\left(\log(N/s) \right)$. This gives insights into why using small $d$ performed well in numerical experiments involving such matrices. Furthermore, we derive quantitative sampling theorems for our constructions which show our construction outperforming the existing state-of-the-art. We also used our results to compare performance of sparse recovery algorithms where these matrices are used for linear sketching.

Unified approaches based effective capacity analysis over composite $\alpha-\eta-\mu$/gamma fading channels. (arXiv:1804.09213v1 [cs.IT])

This letter analyses the effective capacity of communications system using unified models. In order to obtain a simple closed-form mathematically tractable expression, two different unified approximate models have been used. The mixture gamma (MG) distribution which is highly accurate approximation approach has been firstly employed to represent the signal-to-noise-ratio (SNR) of fading channel. In the second approach, the mixture of Gaussian (MoG) distribution which is another unified representation approach has been utilised. A comparison between the simulated and numerical results using both distributions over composite $\alpha-\eta-\mu$/gamma fading channels has been provided.

On Learning Sparsely Used Dictionaries from Incomplete Samples. (arXiv:1804.09217v1 [stat.ML])

Most existing algorithms for dictionary learning assume that all entries of the (high-dimensional) input data are fully observed. However, in several practical applications (such as hyper-spectral imaging or blood glucose monitoring), only an incomplete fraction of the data entries may be available. For incomplete settings, no provably correct and polynomial-time algorithm has been reported in the dictionary learning literature. In this paper, we provide provable approaches for learning - from incomplete samples - a family of dictionaries whose atoms have sufficiently "spread-out" mass. First, we propose a descent-style iterative algorithm that linearly converges to the true dictionary when provided a sufficiently coarse initial estimate. Second, we propose an initialization algorithm that utilizes a small number of extra fully observed samples to produce such a coarse initial estimate. Finally, we theoretically analyze their performance and provide asymptotic statistical and computational guarantees.

ImVerde: Vertex-Diminished Random Walk for Learning Network Representation from Imbalanced Data. (arXiv:1804.09222v1 [cs.SI])

Imbalanced data widely exists in many high-impact applications. An example is in air traffic control, where we aim to identify the leading indicators for each type of accident cause from historical records. Among all three types of accident causes, historical records with 'personnel issues' are much more than the other two types ('aircraft issues' and 'environmental issues') combined. Thus, the resulting dataset is highly imbalanced, and can be naturally modeled as a network. Up until now, most existing work on imbalanced data analysis focused on the classification setting, and very little is devoted to learning the node representation from imbalanced networks. To address this problem, in this paper, we propose Vertex-Diminished Random Walk (VDRW) for imbalanced network analysis. The key idea is to encourage the random particle to walk within the same class by adjusting the transition probabilities each step. It resembles the existing Vertex Reinforced Random Walk in terms of the dynamic nature of the transition probabilities, as well as some convergence properties. However, it is more suitable for analyzing imbalanced networks as it leads to more separable node representations in the embedding space. Then, based on VDRW, we propose a semi-supervised network representation learning framework named ImVerde for imbalanced networks, in which context sampling uses VDRW and the label information to create node-context pairs, and balanced-batch sampling adopts a simple under-sampling method to balance these pairs in different classes. Experimental results demonstrate that ImVerde based on VDRW outperforms state-of-the-art algorithms for learning network representation from imbalanced data.

Hybrid LISA for Wideband Multiuser Millimeter Wave Communication Systems under Beam Squint. (arXiv:1804.09223v1 [cs.IT])

This work jointly addresses user scheduling and precoder/combiner design in the downlink of a wideband millimeter wave (mmWave) communications system. We consider Orthogonal Frequency Division Multiplexing (OFDM) modulation to overcome channel frequency selectivity and obtain a number of equivalent narrowband channels. Hence, the main challenge is that the analog preprocessing network is frequency flat and common to all the users at the transmitter side. Moreover, the effect of the signal bandwidth over the Uniform Linear Array (ULA) steering vectors has to be taken into account to design the hybrid precoders and combiners. The proposed algorithmic solution is based on Linear Successive Allocation (LISA), which greedily allocates streams to different users and computes the corresponding precoders and combiners. By taking into account the rank limitations imposed by the hardware at transmission and reception, the performance loss in terms of achievable sum rate for the hybrid approach is negligible. Numerical experiments show that the proposed method exhibits excellent performance with reasonable computational complexity.

How Diverse Users and Activities Trigger Connective Action via Social Media: Lessons from the Twitter Hashtag Campaign #ILookLikeAnEngineer. (arXiv:1804.09226v1 [cs.SI])

We present a study that examines how a social media activism campaign aimed at improving gender diversity within engineering gained and maintained momentum in its early period. We examined over 50,000 Tweets posted over the first ~75 days of the #ILookLikeAnEngineer campaign and found that diverse participation - of types of users - increased activity at crucial moments. We categorize these triggers into four types: 1) Event-Driven: Alignment of the campaign with offline events related to the issue (Diversity SFO, Disrupt, etc.); 2) Media-Driven: News coverage of the events in the media (TechCrunch, CNN, BBC, etc.); 3) Industry-Driven: Web participation in the campaign by large organizations (Microsoft, Tesla, GE, Cisco, etc.); and 4) Personality-Driven: Alignment of the events with popular and/or known personalities (e.g. Isis Anchalee; Michelle Sun; Ada Lovelace.) This study illustrates how one mechanism - triggering - supports connective action in social media campaign.

Transferring Interactive Search-Based Software Testing to Industry. (arXiv:1804.09232v1 [cs.SE])

Search-Based Software Testing (SBST) is the application of optimization algorithms to problems in software testing. In previous work, we have implemented and evaluated Interactive Search-Based Software Testing (ISBST) tool prototypes, with a goal to successfully transfer the technique to industry. While SBSE solutions are often validated on benchmark problems, there is a need to validate them in an operational setting. The present paper discusses the development and deployment of SBST tools for use in industry and reflects on the transfer of these techniques to industry. In addition to previous work discussing the development and validation of an ISBST prototype, a new version of the prototype ISBST system was evaluated in the laboratory and in industry. This evaluation is based on an industrial System under Test (SUT) and was carried out with industrial practitioners. The Technology Transfer Model is used as a framework to describe the progression of the development and evaluation of the ISBST system. The paper presents a synthesis of previous work developing and evaluating the ISBST prototype, as well as presenting an evaluation, in both academia and industry, of that prototype's latest version. This paper presents an overview of the development and deployment of the ISBST system in an industrial setting, using the framework of the Technology Transfer Model. We conclude that the ISBST system is capable of evolving useful test cases for that setting, though improvements in the means the system uses to communicate that information to the user are still required. In addition, a set of lessons learned from the project are listed and discussed. Our objective is to help other researchers that wish to validate search-based systems in industry and provide more information about the benefits and drawbacks of these systems.

Fine-grained Video Classification and Captioning. (arXiv:1804.09235v1 [cs.CV])

We describe a DNN for fine-grained action classification and video captioning. It gives state-of-the-art performance on the challenging Something-Something dataset, with over 220, 000 videos and 174 fine-grained actions. Classification and captioning on this dataset are challenging because of the subtle differences between actions, the use of thousands of different objects, and the diversity of captions penned by crowd actors. The model architecture shares features for classification and captioning, and is trained end-to-end. It performs much better than the existing classification benchmark for Something-Something, with impressive fine-grained results, and it yields a strong baseline on the new Something-Something captioning task. Our results reveal that there is a strong correlation between the degree of detail in the task and the ability of the learned features to transfer to other tasks.

A Sparse Coding Multi-Scale Precise-Timing Machine Learning Algorithm for Neuromorphic Event-Based Sensors. (arXiv:1804.09236v1 [cs.CV])

This paper introduces an unsupervised compact architecture that can extract features and classify the contents of dynamic scenes from the temporal output of a neuromorphic asynchronous event-based camera. Event-based cameras are clock-less sensors where each pixel asynchronously reports intensity changes encoded in time at the microsecond precision. While this technology is gaining more attention, there is still a lack of methodology and understanding of their temporal properties. This paper introduces an unsupervised time-oriented event-based machine learning algorithm building on the concept of hierarchy of temporal descriptors called time surfaces. In this work we show that the use of sparse coding allows for a very compact yet efficient time-based machine learning that lowers both the computational cost and memory need. We show that we can represent visual scene temporal dynamics with a finite set of elementary time surfaces while providing similar recognition rates as an uncompressed version by storing the most representative time surfaces using clustering techniques. Experiments will illustrate the main optimizations and trade-offs to consider when implementing the method for online continuous vs. offline learning. We report results on the same previously published 36 class character recognition task and a 4 class canonical dynamic card pip task, achieving 100% accuracy on each.

Semi-Supervised Learning with Declaratively Specified Entropy Constraints. (arXiv:1804.09238v1 [cs.LG])

We propose a technique for declaratively specifying strategies for semi-supervised learning (SSL). The proposed method can be used to specify ensembles of semi-supervised learning, as well as agreement constraints and entropic regularization constraints between these learners, and can be used to model both well-known heuristics such as co-training and novel domain-specific heuristics. In addition to representing individual SSL heuristics, we show that multiple heuristics can also be automatically combined using Bayesian optimization methods. We show consistent improvements on a suite of well-studied SSL benchmarks, including a new state-of-the-art result on a difficult relation extraction task.

Reconfiguration of graph minors. (arXiv:1804.09240v1 [cs.DS])

Under the reconfiguration framework, we consider the various ways that a target graph $H$ is a {\em minor} of a host graph $G$, where a subgraph of $G$ can be transformed into $H$ by means of {\em edge contraction} (replacement of both endpoints of an edge by a new vertex adjacent to any vertex adjacent to either endpoint). Equivalently, an {\em $H$-model} of $G$ is a labeling of the vertices of $G$ with the vertices of $H$, where the contraction of all edges between identically-labeled vertices results in a graph containing representations of all edges in $H$.

We explore the properties of $G$ and $H$ that result in a connected {\em reconfiguration graph}, in which nodes represent $H$-models and two nodes are adjacent if their corresponding $H$-models differ by the label of a single vertex of $G$. Various operations on $G$ or $H$ are shown to preserve connectivity. In addition, we demonstrate properties of graphs $G$ that result in connectivity for the target graphs $K_2$, $K_3$, and $K_4$, including a full characterization of graphs $G$ that result in connectivity for $K_2$-models, as well as the relationship between connectivity of $G$ and other $H$-models.

A Hardware Platform for Efficient Multi-Modal Sensing with Adaptive Approximation. (arXiv:1804.09241v1 [physics.app-ph])

We present Warp, a hardware platform to support research in approximate computing, sensor energy optimization, and energy-scavenged systems. Warp incorporates 11 state-of-the-art sensor integrated circuits, computation, and an energy-scavenged power supply, all within a miniature system that is just 3.6 cm x 3.3 cm x 0.5 cm. Warp's sensor integrated circuits together contain a total of 21 sensors with a range of precisions and accuracies for measuring eight sensing modalities of acceleration, angular rate, magnetic flux density (compass heading), humidity, atmospheric pressure (elevation), infrared radiation, ambient temperature, and color. Warp uses a combination of analog circuits and digital control to facilitate further tradeoffs between sensor and communication accuracy, energy efficiency, and performance. This article presents the design of Warp and presents an evaluation of our hardware implementation. The results show how Warp's design enables performance and energy efficiency versus ac- curacy tradeoffs.

Reliability based-design optimization using the directional bat algorithm. (arXiv:1804.09250v1 [math.OC])

Reliability based design optimization (RBDO) problems are important in engineering applications, but it is challenging to solve such problems. In this study, a new resolution method based on the directional Bat Algorithm (dBA) is presented. To overcome the difficulties in the evaluations of probabilistic constraints, the reliable design space concept has been applied to convert the yielded stochastic constrained optimization problem from the RBDO formulation into a deterministic constrained optimization problem. In addition, the constraint handling technique has also been introduced to the dBA so that the algorithm can solve constrained optimization problem effectively. The new method has been applied to several engineering problems and the results show that the new method can solve different varieties of RBDO problems efficiently. In fact, the obtained solutions are consistent with the best results in the literature.

DeepTriangle: A Deep Learning Approach to Loss Reserving. (arXiv:1804.09253v1 [stat.AP])

We propose a novel approach for loss reserving based on deep neural networks. The approach allows for jointly modeling of paid losses and claims outstanding, and incorporation of heterogenous inputs. We validate the models on loss reserving data across lines of business, and show that they attain or exceed the predictive accuracy of existing stochastic methods. The models require minimal feature engineering and expert input, and can be automated to produce forecasts at a high frequency.

Cache-aware data structures for packet forwarding tables on general purpose CPUs. (arXiv:1804.09254v1 [cs.NI])

Longest prefix matching has long been the bottleneck of the Bloom filter-based solutions for packet forwarding implemented in software. We propose a search algorithm to match a destination IP address against a compact representation of the FIB table in CPU cache on general purpose hardware with an average performance target of O(log n) for an n-bit address.

DIDO Hammerstein Identification of Mild Steel Welding Pool in Pulsed GTAW Dynamic Process with Wire Filler. (arXiv:1804.09258v1 [cs.SY])

This paper analyzed the nonlinearity of welding dynamic process, and then adopted MIMO Hammerstein model to describe approximately the process. An identification algorithm was developed and pseudo random signals were adopted as model input. Through a welding experiment, input-output data were obtained and the Hammerstein model of welding pool was identified

Commonsense mining as knowledge base completion? A study on the impact of novelty. (arXiv:1804.09259v1 [cs.CL])

Commonsense knowledge bases such as ConceptNet represent knowledge in the form of relational triples. Inspired by the recent work by Li et al., we analyse if knowledge base completion models can be used to mine commonsense knowledge from raw text. We propose novelty of predicted triples with respect to the training set as an important factor in interpreting results. We critically analyse the difficulty of mining novel commonsense knowledge, and show that a simple baseline method outperforms the previous state of the art on predicting more novel.

An Adaptive Primary User Emulation Attack Detection Mechanism for Cognitive Radio Networks. (arXiv:1804.09266v1 [cs.NI])

The proliferation of advanced information technologies (IT), especially the wide spread of Internet of Things (IoTs) makes wireless spectrum a precious resource. Cognitive radio network (CRN) has been recognized as the key to achieve efficient utility of communication bands. Because of the great difficulty, high complexity and regulations in dynamic spectrum access (DSA), it is very challenging to protect CRNs from malicious attackers or selfish abusers. Primary user emulation (PUE) attacks is one type of easy-to-launch but hard-to-detect attacks in CRNs that malicious entities mimic PU signals in order to either occupy spectrum resource selfishly or conduct Denial of Service (DoS) attacks. Inspired by the physical features widely used as the fingerprint of variant electronic devices, an adaptive and realistic PUE attack detection technique is proposed in this paper. It leverages the PU transmission features that attackers are not able to mimic. In this work, the transmission power is selected as one of the hard-to-mimic features due to the intrinsic discrepancy between PUs and attackers, while considering constraints in real implementations. Our experimental results verified the effectiveness and correctness of the proposed mechanism.

BlendCAC: A BLockchain-ENabled Decentralized Capability-based Access Control for IoTs. (arXiv:1804.09267v1 [cs.NI])

The prevalence of Internet of Things (IoTs) allows heterogeneous embedded smart devices to collaboratively provide smart services with or without human intervention. While leveraging the large scale IoT based applications like Smart Gird or Smart Cities, IoTs also incur more concerns on privacy and security. Among the top security challenges that IoTs face, access authorization is critical in resource sharing and information protection. One of the weaknesses in today's access control (AC) is the centralized authorization server, which can be the performance bottleneck or the single point of failure. In this paper, BlendCAC, a blockchain enabled decentralized capability based AC is proposed for the security of IoTs. The BlendCAC aims at an effective access control processes to devices, services and information in large scale IoT systems. Based on the blockchain network, a capability delegation mechanism is suggested for access permission propagation. A robust identity based capability token management strategy is proposed, which takes advantage of smart contract for registering, propagation and revocation of the access authorization. In the proposed BlendCAC scheme, IoT devices are their own master to control their resources instead of being supervised by a centralized authority. Implemented and tested on a Raspberry Pi device and on a local private blockchain network, our experimental results demonstrate the feasibility of the proposed BlendCAC approach to offer a decentralized, scalable, lightweight and fine grained AC solution to IoT systems.

Learning 3D Segment Descriptors for Place Recognition. (arXiv:1804.09270v1 [cs.RO])

In the absence of global positioning information, place recognition is a key capability for enabling localization, mapping and navigation in any environment. Most place recognition methods rely on images, point clouds, or a combination of both. In this work we leverage a segment extraction and matching approach to achieve place recognition in Light Detection and Ranging (LiDAR) based 3D point cloud maps. One challenge related to this approach is the recognition of segments despite changes in point of view or occlusion. We propose using a learning based method in order to reach a higher recall accuracy then previously proposed methods. Using Convolutional Neural Networks (CNNs), which are state-of-the-art classifiers, we propose a new approach to segment recognition based on learned descriptors. In this paper we compare the effectiveness of three different structures and training methods for CNNs. We demonstrate through several experiments on real-world data collected in an urban driving scenario that the proposed learning based methods outperform hand-crafted descriptors.

Segmentation-Free Approaches for Handwritten Numeral String Recognition. (arXiv:1804.09279v1 [cs.CV])

This paper presents segmentation-free strategies for the recognition of handwritten numeral strings of unknown length. A synthetic dataset of touching numeral strings of sizes 2-, 3- and 4-digits was created to train end-to-end solutions based on Convolutional Neural Networks. A robust experimental protocol is used to show that the proposed segmentation-free methods may reach the state-of-the-art performance without suffering the heavy burden of over-segmentation based methods. In addition, they confirmed the importance of introducing contextual information in the design of end-to-end solutions, such as the proposed length classifier when recognizing numeral strings.

A Closer Look at Weak Label Learning for Audio Events. (arXiv:1804.09288v1 [cs.SD])

Audio content analysis in terms of sound events is an important research problem for a variety of applications. Recently, the development of weak labeling approaches for audio or sound event detection (AED) and availability of large scale weakly labeled dataset have finally opened up the possibility of large scale AED. However, a deeper understanding of how weak labels affect the learning for sound events is still missing from literature. In this work, we first describe a CNN based approach for weakly supervised training of audio events. The approach follows some basic design principle desirable in a learning method relying on weakly labeled audio. We then describe important characteristics, which naturally arise in weakly supervised learning of sound events. We show how these aspects of weak labels affect the generalization of models. More specifically, we study how characteristics such as label density and corruption of labels affects weakly supervised training for audio events. We also study the feasibility of directly obtaining weak labeled data from the web without any manual label and compare it with a dataset which has been manually labeled. The analysis and understanding of these factors should be taken into picture in the development of future weak label learning methods. Audioset, a large scale weakly labeled dataset for sound events is used in our experiments.

The set of dimensions for which there are no linear perfect 2-error-correcting Lee codes has positive density. (arXiv:1804.09290v1 [cs.IT])

The Golomb-Welch conjecture states that there are no perfect $e$-error-correcting Lee codes in $\mathbb{Z}^n$ ($PL(n,e)$-codes) whenever $n\geq 3$ and $e\geq 2$. A special case of this conjecture is when $e=2$. In a recent paper of A. Campello, S. Costa and the author of this paper, it is proved that the set $\mathcal{N}$ of dimensions $n\geq 3$ for which there are no linear $PL(n,2)$-codes is infinite and $\#\{n \in \mathcal{N}: n\leq x\} \geq \frac{x}{3\ln(x)/2} (1+o(1))$. In this paper we present a simple and elementary argument which allows to improve the above result to $\#\{n \in \mathcal{N}: n\leq x\} \geq \frac{4x}{25} (1+o(1))$. In particular, this implies that the set $\mathcal{N}$ has positive (lower) density in $\mathbb{Z}^+$.

Taichi: An Open-Source Computer Graphics Library. (arXiv:1804.09293v1 [cs.GR])

An ideal software system in computer graphics should be a combination of innovative ideas, solid software engineering and rapid development. However, in reality these requirements are seldom met simultaneously. In this paper, we present early results on an open-source library named Taichi (this http URL), which alleviates this practical issue by providing an accessible, portable, extensible, and high-performance infrastructure that is reusable and tailored for computer graphics. As a case study, we share our experience in building a novel physical simulation system using Taichi.

Recent Progresses in Deep Learning based Acoustic Models (Updated). (arXiv:1804.09298v1 [eess.AS])

In this paper, we summarize recent progresses made in deep learning based acoustic models and the motivation and insights behind the surveyed techniques. We first discuss acoustic models that can effectively exploit variable-length contextual information, such as recurrent neural networks (RNNs), convolutional neural networks (CNNs), and their various combination with other models. We then describe acoustic models that are optimized end-to-end with emphasis on feature representations learned jointly with rest of the system, the connectionist temporal classification (CTC) criterion, and the attention-based sequence-to-sequence model. We further illustrate robustness issues in speech recognition systems, and discuss acoustic model adaptation, speech enhancement and separation, and robust training strategies. We also cover modeling techniques that lead to more efficient decoding and discuss possible future directions in acoustic model research.

Seq2Seq-Vis: A Visual Debugging Tool for Sequence-to-Sequence Models. (arXiv:1804.09299v1 [cs.CL])

Neural Sequence-to-Sequence models have proven to be accurate and robust for many sequence prediction tasks, and have become the standard approach for automatic translation of text. The models work in a five stage blackbox process that involves encoding a source sequence to a vector space and then decoding out to a new target sequence. This process is now standard, but like many deep learning methods remains quite difficult to understand or debug. In this work, we present a visual analysis tool that allows interaction with a trained sequence-to-sequence model through each stage of the translation process. The aim is to identify which patterns have been learned and to detect model errors. We demonstrate the utility of our tool through several real-world large-scale sequence-to-sequence use cases.

Gender Bias in Coreference Resolution. (arXiv:1804.09301v1 [cs.CL])

We present an empirical study of gender bias in coreference resolution systems. We first introduce a novel, Winograd schema-style set of minimal pair sentences that differ only by pronoun gender. With these "Winogender schemas," we evaluate and confirm systematic gender bias in three publicly-available coreference resolution systems, and correlate this bias with real-world and textual gender statistics.

Real-Time Inference of User Types to Assist with More Inclusive Social Media Activism Campaigns. (arXiv:1804.09304v1 [cs.SI])

Social media provides a mechanism for people to engage with social causes across a range of issues. It also provides a strategic tool to those looking to advance a cause to exchange, promote or publicize their ideas. In such instances, AI can be either an asset if used appropriately or a barrier. One of the key issues for a workforce diversity campaign is to understand in real-time who is participating - specifically, whether the participants are individuals or organizations, and in case of individuals, whether they are male or female. In this paper, we present a study to demonstrate a case for AI for social good that develops a model to infer in real-time the different user types participating in a cause-driven hashtag campaign on Twitter, ILookLikeAnEngineer (ILLAE). A generic framework is devised to classify a Twitter user into three classes: organization, male and female in a real-time manner. The framework is tested against two datasets (ILLAE and a general dataset) and outperforms the baseline binary classifiers for categorizing organization/individual and male/female. The proposed model can be applied to future social cause-driven campaigns to get real-time insights on the macro-level social behavior of participants.

Ambient Backscatter Systems: Exact Average Bit Error Rate under Fading Channels. (arXiv:1804.09307v1 [cs.IT])

The success of Internet-of-Things (IoT) paradigm relies on, among other things, developing energy-efficient communication techniques that can enable information exchange among billions of battery-operated IoT devices. With its technological capability of simultaneous information and energy transfer, ambient backscatter is quickly emerging as an appealing solution for this communication paradigm, especially for the links with low data rate requirement. In this paper, we study signal detection and characterize exact bit error rate for the ambient backscatter system. In particular, we formulate a binary hypothesis testing problem at the receiver and analyze system performance under three detection techniques: a) mean threshold (MT), b) maximum likelihood threshold (MLT), and c) approximate MLT. Motivated by the energy-constrained nature of IoT devices, we perform the above analyses for two receiver types: i) the ones that can accurately track channel state information (CSI), and ii) the ones that cannot. Two main features of the analysis that distinguish this work from the prior art are the characterization of the exact conditional density functions of the average received signal energy, and the characterization of exact average bit error rate (BER) for this setup. The key challenge lies in the handling of correlation between channel gains of two hypotheses for the derivation of joint probability distribution of magnitude squared channel gains that is needed for the BER analysis.

Vulnerability Analysis of Smart Grids to GPS Spoofing. (arXiv:1804.09310v1 [cs.SY])

Sensors such as phasor measurement units (PMUs) endowed with GPS receivers are ubiquitously installed providing real-time grid visibility. A number of PMUs can cooperatively enable state estimation routines. However, GPS spoofing attacks can notably alter the PMU measurements, mislead the network operator, and drastically impact subsequent corrective control actions. Leveraging a novel measurement model that explicitly accounts for the GPS spoofing attacks, this paper formulates an optimization problem to identify the most vulnerable PMUs in the network. A greedy algorithm is developed to solve the aforementioned problem. Furthermore, the paper develops a computationally efficient alternating minimization algorithm for joint state estimation and attack reconstruction. Numerical tests on IEEE benchmark networks validate the developed methods.

Deep Learning for Predicting Asset Returns. (arXiv:1804.09314v1 [stat.ML])

Deep learning searches for nonlinear factors for predicting asset returns. Predictability is achieved via multiple layers of composite factors as opposed to additive ones. Viewed in this way, asset pricing studies can be revisited using multi-layer deep learners, such as rectified linear units (ReLU) or long-short-term-memory (LSTM) for time-series effects. State-of-the-art algorithms including stochastic gradient descent (SGD), TensorFlow and dropout design provide imple- mentation and efficient factor exploration. To illustrate our methodology, we revisit the equity market risk premium dataset of Welch and Goyal (2008). We find the existence of nonlinear factors which explain predictability of returns, in particular at the extremes of the characteristic space. Finally, we conclude with directions for future research.

Hierarchical RNN for Information Extraction from Lawsuit Documents. (arXiv:1804.09321v1 [cs.CL])

Every lawsuit document contains the information about the party's claim, court's analysis, decision and others, and all of this information are helpful to understand the case better and predict the judge's decision on similar case in the future. However, the extraction of these information from the document is difficult because the language is too complicated and sentences varied at length. We treat this problem as a task of sequence labeling, and this paper presents the first research to extract relevant information from the civil lawsuit document in China with the hierarchical RNN framework.

Robust Anomaly-Based Ship Proposals Detection from Pan-sharpened High-Resolution Satellite Image. (arXiv:1804.09322v1 [cs.CV])

Pre-screening of ship proposals is now employed by top ship detectors to avoid exhaustive search across image. In very high resolution (VHR) optical image, ships appeared as a cluster of abnormal bright pixels in open sea clutter (noise-like background). Anomaly-based detector utilizing Panchromatic (PAN) data has been widely used in many researches to detect ships, however, still facing two main drawbacks: 1) detection rate tend to be low particularly when a ship is low contrast and 2) these models require a high manual configuration to select a threshold value best separate ships from sea surface background. This paper aims at further investigation of anomaly-based model to solve those issues. First, pan-sharpened Multi Spectral (MS) data is incorporated together with PAN to enhance ship discrimination. Second, we propose an improved anomaly-based model combining both global intensity anomaly and local texture anomaly map. Regarding noise appeared due to the present of sea clutter and because of pan-sharpen process, texture abnormality suppression term based on quantization theory is introduced. Experimental results on VNREDSat-1 VHR optical satellite images suggest that the pan-sharpened near-infrared (P-NIR) band can improve discrimination of ships from surrounding waters. Compared to state-of-the-art anomaly-based detectors, our proposed anomaly-based model on the combination of PAN and P-NIR data cannot only achieved highest ship detection's recall rate (91.14% and 45.9% on high-contrast and low-contrast dataset respectively) but also robust to different automatic threshold selection techniques.

Object Tracking in Satellite Videos Based on a Multi-Frame Optical Flow Tracker. (arXiv:1804.09323v1 [cs.CV])

Object tracking is a hot topic in computer vision. Thanks to the booming of the very high resolution (VHR) remote sensing techniques, it is now possible to track targets of interests in satellite videos. However, since the targets in the satellite videos are usually too small compared with the entire image, and too similar with the background, most state-of-the-art algorithms failed to track the target in satellite videos with a satisfactory accuracy. Due to the fact that optical flow shows the great potential to detect even the slight movement of the targets, we proposed a multi-frame optical flow tracker (MOFT) for object tracking in satellite videos. The Lucas-Kanade optical flow method was fused with the HSV color system and integral image to track the targets in the satellite videos, while multi-frame difference method was utilized in the optical flow tracker for a better interpretation. The experiments with three VHR remote sensing satellite video datasets indicate that compared with state-of-the-art object tracking algorithms, the proposed method can track the target more accurately.

Processing Database Joins over a Shared-Nothing System of Multicore Machines. (arXiv:1804.09324v1 [cs.DB])

To process a large volume of data, modern data management systems use a collection of machines connected through a network. This paper looks into the feasibility of scaling up such a shared-nothing system while processing a compute- and communication-intensive workload---processing distributed joins. By exploiting multiple processing cores within the individual machines, we implement a system to process database joins that parallelizes computation within each node, pipelines the computation with communication, parallelizes the communication by allowing multiple simultaneous data transfers (send/receive), and removes synchronization barriers (a scalability bottleneck in a distributed data processing system). Our experimental results show that using only four threads per node the framework achieves a 3.5x gains in intra-node performance while compared with a single-threaded counterpart. Moreover, with the join processing workload the cluster-wide performance (and speedup) is observed to be dictated by the intra-node computational loads; this property brings a near-linear speedup with increasing nodes in the system, a feature much desired in modern large-scale data processing system.

Multi-focus Noisy Image Fusion using Low-Rank Representation. (arXiv:1804.09325v1 [cs.CV])

In the process of image acquisition, the noise is inevitable for source image. The multi-focus noisy image fusion is a very challenging task. However, there is no truly adaptive noisy image fusion approaches at present. As we all know, Low-Rank representation (LRR) is robust to noise and outliers. In this paper, we propose a novel fusion method based on LRR for multi-focus noisy image fusion. In the discrete wavelet transform(DWT) framework, the low frequency coefficients are fused by spatial frequency, the high frequency coefficients are fused by LRR coefficients. Finally, the fused image is obtained by inverse DWT. Experimental results demonstrate that the proposed algorithm can obtain state-of-the-art performance when the source image contains noise. The Code of our fusion method is available at https://github.com/exceptionLi/imagefusion_noisy_lrr

Where are we now? A large benchmark study of recent symbolic regression methods. (arXiv:1804.09331v1 [cs.NE])

In this paper we provide a broad benchmarking of recent genetic programming approaches to symbolic regression in the context of state of the art machine learning approaches. We use a set of nearly 100 regression benchmark problems culled from open source repositories across the web. We conduct a rigorous benchmarking of four recent symbolic regression approaches as well as nine machine learning approaches from scikit-learn. The results suggest that symbolic regression performs strongly compared to state-of-the-art gradient boosting algorithms, although in terms of running times is among the slowest of the available methodologies. We discuss the results in detail and point to future research directions that may allow symbolic regression to gain wider adoption in the machine learning community.

Wireless Quantization Index Modulation: Enabling Communication Through Existing Signals. (arXiv:1804.09336v1 [cs.NI])

As the number of IoT devices continue to exponentially increase and saturate the wireless spectrum, there is a dire need for additional spectrum to support large networks of wireless devices. Over the past years, many promising solutions have been proposed but they all suffer from the drawback of new infrastructure costs, setup and maintenance, or are difficult to implement due to FCC regulations. In this paper, we propose a novel Wireless Quantization Index Modulation (QIM) technique which uses existing infrastructure to embed information into existing wireless signals to communicate with IoT devices with negligible impact on the original signal and zero spectrum overhead. We explore the design space for wireless QIM and evaluate the performance of embedding information in TV, FM and AM radio broadcast signals under different conditions. We demonstrate that we can embed messages at up to 8-200~kbps with negligible impact on the audio and video quality of the original FM, AM and TV signals respectively.

Learning a Discriminative Feature Network for Semantic Segmentation. (arXiv:1804.09337v1 [cs.CV])

Most existing methods of semantic segmentation still suffer from two aspects of challenges: intra-class inconsistency and inter-class indistinction. To tackle these two problems, we propose a Discriminative Feature Network (DFN), which contains two sub-networks: Smooth Network and Border Network. Specifically, to handle the intra-class inconsistency problem, we specially design a Smooth Network with Channel Attention Block and global average pooling to select the more discriminative features. Furthermore, we propose a Border Network to make the bilateral features of boundary distinguishable with deep semantic boundary supervision. Based on our proposed DFN, we achieve state-of-the-art performance 86.2% mean IOU on PASCAL VOC 2012 and 80.3% mean IOU on Cityscapes dataset.

Distribution-based objectives for Markov Decision Processes. (arXiv:1804.09341v1 [cs.LO])

We consider distribution-based objectives for Markov Decision Processes (MDP). This class of objectives gives rise to an interesting trade-off between full and partial information. As in full observation, the strategy in the MDP can depend on the state of the system, but similar to partial information, the strategy needs to account for all the states at the same time. In this paper, we focus on two safety problems that arise naturally in this context, namely, existential and universal safety. Given an MDP A and a closed and convex polytope H of probability distributions over the states of A, the existential safety problem asks whether there exists some distribution d in H and a strategy of A, such that starting from d and repeatedly applying this strategy keeps the distribution forever in H. The universal safety problem asks whether for all distributions in H, there exists such a strategy of A which keeps the distribution forever in H. We prove that both problems are decidable, with tight complexity bounds: we show that existential safety is PTIME-complete, while universal safety is co-NP-complete. Further, we compare these results with existential and universal safety problems for Rabin's probabilistic finite-state automata (PFA), the subclass of Partially Observable MDPs which have zero observation. Compared to MDPs, strategies of PFAs are not state-dependent. In sharp contrast to the PTIME result, we show that existential safety for PFAs is undecidable, with H having closed and open boundaries. On the other hand, it turns out that the universal safety for PFAs is decidable in EXPTIME, with a co-NP lower bound. Finally, we show that an alternate representation of the input polytope allows us to improve the complexity of universal safety for MDPs and PFAs.

Adaptation and Re-Identification Network: An Unsupervised Deep Transfer Learning Approach to Person Re-Identification. (arXiv:1804.09347v1 [cs.CV])

Person re-identification (Re-ID) aims at recognizing the same person from images taken across different cameras. To address this task, one typically requires a large amount labeled data for training an effective Re-ID model, which might not be practical for real-world applications. To alleviate this limitation, we choose to exploit a sufficient amount of pre-existing labeled data from a different (auxiliary) dataset. By jointly considering such an auxiliary dataset and the dataset of interest (but without label information), our proposed adaptation and re-identification network (ARN) performs unsupervised domain adaptation, which leverages information across datasets and derives domain-invariant features for Re-ID purposes. In our experiments, we verify that our network performs favorably against state-of-the-art unsupervised Re-ID approaches, and even outperforms a number of baseline Re-ID methods which require fully supervised data for training.

Generalized Gaussian Kernel Adaptive Filtering. (arXiv:1804.09348v1 [cs.LG])

The present paper proposes generalized Gaussian kernel adaptive filtering, where the kernel parameters are adaptive and data-driven. The Gaussian kernel is parametrized by a center vector and a symmetric positive definite (SPD) precision matrix, which is regarded as a generalization of the scalar width parameter. These parameters are adaptively updated on the basis of a proposed least-square-type rule to minimize the estimation error. The main contribution of this paper is to establish update rules for precision matrices on the SPD manifold in order to keep their symmetric positive-definiteness. Different from conventional kernel adaptive filters, the proposed regressor is a superposition of Gaussian kernels with all different parameters, which makes such regressor more flexible. The kernel adaptive filtering algorithm is established together with a l1-regularized least squares to avoid overfitting and the increase of dimensionality of the dictionary. Experimental results confirm the validity of the proposed method.

Shape Neutral Analysis of Graph-based Data-structures. (arXiv:1804.09352v1 [cs.PL])

Malformed data-structures can lead to runtime errors such as arbitrary memory access or corruption. Despite this, reasoning over data-structure properties for low-level heap manipulating programs remains challenging. In this paper we present a constraint-based program analysis that checks data-structure integrity, w.r.t. given target data-structure properties, as the heap is manipulated by the program. Our approach is to automatically generate a solver for properties using the type definitions from the target program. The generated solver is implemented in Constraint Handling Rules (CHR) extending builtin heap, integer and equality solvers. A key property of our program analysis is that the target data-structure properties are shape neutral, i.e. the analysis does not check for properties relating to a given data-structure graph shape, such as doubly-linked-lists versus trees. Nevertheless, the analysis can detect errors in wide range of data-structure manipulating programs, including those that use lists, trees, DAGs, graphs, etc. We present an implementation based on a specialized shape neutral constraint solver implemented using the Satisfiability Modulo Constraint Handling Rules (SMCHR) system. Experimental results show that our approach works well for real-world C programs.

Age-Optimal Trajectory Planning for UAV-Assisted Data Collection. (arXiv:1804.09356v1 [cs.IT])

Unmanned aerial vehicle (UAV)-aided data collection is a new and promising application in many practical scenarios. In this work, we study the age-optimal trajectory planning problem in UAV-enabled wireless sensor networks, where a UAV is dispatched to collect data from the ground sensor nodes (SNs). The age of information (AoI) collected from each SN is characterized by the data uploading time and the time elapsed since the UAV leaves this SN. We attempt to design two age-optimal trajectories, referred to as the Max-AoI-optimal and Ave-AoI-optimal trajectories, respectively. The Max-AoI-optimal trajectory planning is to minimize the age of the `oldest' sensed information among the SNs. The Ave-AoI-optimal trajectory planning is to minimize the average AoI of all the SNs. Then, we show that each age-optimal flight trajectory corresponds to a shortest Hamiltonian path in the wireless sensor network where the distance between any two SNs represents the amount of inter-visit time. The dynamic programming (DP) method and genetic algorithm (GA) are adopted to find the two different age-optimal trajectories. Simulation results validate the effectiveness of the proposed methods, and show how the UAV's trajectory is affected by the two AoI metrics.

Analysis of Indoor Uplink Optical Communication Positioning System Exploiting Multipath Reflections. (arXiv:1804.09360v1 [cs.IT])

In this paper, we introduce an uplink optical wireless positioning system for indoor applications. This technique uses fingerprints based on the indoor optical wireless channel impulse response for localization. Exploiting the line of sight peak power (LOS), the second power peak (SPP) of the impulse response, and the delay between the LOS and SPP, we present a proof of concept design and theoretical analysis for localization employing a single fixed reference point, i.e., a photodetector (PD) on the ceiling. Adding more PDs leads to more accurate transmitter position estimation. As a benchmark, we present analytical expressions of the Cramer-Rao lower bound (CRLB) for different numbers of PDs and features. We further present closed form analytical approximations for the chosen features of the channel impulse response. Simulation results show a root mean square (RMS) positioning accuracy of 25\,cm and 5\,cm for one and four PDs, respectively, for a typical indoor room at high SNR. Numerical results verify that the derived analytic approximations closely match the simulations.

Energy Internet: Enablers and Building Blocks. (arXiv:1804.09363v1 [cs.SY])

This paper focuses on the management of the electricity grids using energy packets to build the Energy Internet via machine-type communications. We revisit some attempts to design a digital grid similar to the internet, including packetized management of specific loads (electric vehicles, air conditioners and water boilers) as a way to implement demand-side management. We discuss the viability of up-scaling this approach to manage energy inventories so that supply and demand always matches, given storage capabilities, loads with priority and usage flexibility. We argue that the Energy Internet can be now built due to the advances in micro-grid technologies and machine-type communications that allow for applications with ultra-reliable, low-latency and massive connectivity. We discuss some challenges involved in its deployment. This covers not only technical but also other issues, pointing out possible new elements in the system, namely communication micro-operator, P2P aggregator and energy community.

Driving Policy Transfer via Modularity and Abstraction. (arXiv:1804.09364v1 [cs.RO])

End-to-end approaches to autonomous driving have high sample complexity and are difficult to scale to realistic urban driving. Simulation can help end-to-end driving systems by providing a cheap, safe, and diverse training environment. Yet training driving policies in simulation brings up the problem of transferring such policies to the real world. We present an approach to transferring driving policies from simulation to reality via modularity and abstraction. Our approach is inspired by classic driving systems and aims to combine the benefits of modular architectures and end-to-end deep learning approaches. The key idea is to encapsulate the driving policy such that it is not directly exposed to raw perceptual input or low-level vehicle dynamics. We evaluate the presented approach in simulated urban environments and in the real world. In particular, we transfer a driving policy trained in simulation to a 1/5-scale robotic truck that is deployed in a variety of conditions, with no finetuning, on two continents. The supplementary video can be viewed at https://youtu.be/BrMDJqI6H5U

A quasi linear-time b-Matching algorithm on distance-hereditary graphs and bounded split-width graphs. (arXiv:1804.09393v1 [cs.DS])

We present a quasi linear-time algorithm for Maximum Matching on distance-hereditary graphs and some of their generalizations. This improves on [Dragan, WG'97], who proposed such an algorithm for the subclass of (tent,hexahedron)-free distance-hereditary graphs. Furthermore, our result is derived from a more general one that is obtained for b-Matching. In the (unit cost) b-Matching problem, we are given a graph G = (V, E) together with a nonnegative integer capacity b v for every vertex v $\in$ V. The objective is to assign nonnegative integer weights (x e) e$\in$E so that: for every v $\in$ V the sum of the weights of its incident edges does not exceed b v , and e$\in$E x e is maximized. We present the first algorithm for solving b-Matching on cographs, distance-hereditary graphs and some of their generalizations in quasi linear time. For that, we use a decomposition algorithm that outputs for any graph G a collection of subgraphs of G with no edge-cutsets inducing a complete bipartite subgraph (a.k.a., splits). The latter collection is sometimes called a split decomposition of G. Furthermore, there exists a generic method in order to design graph algorithms based on split decomposition [Rao, DAM'08]. However, this technique only applies to "localized" problems: for which a "best" partial solution for any given subgraph in a split decomposition can be computed almost independently from the remaining of the graph. Such framework does not apply to matching problems since an augmenting path may cross the subgraphs arbitrarily. We introduce a new technique that somehow captures all the partial solutions for a given union of subgraphs in a split decomposition, in a compact and amenable way for algorithms - assuming some piecewise linear assumption holds on the value of such solutions. The latter assumption is shown to hold for b-Matching. Doing so, we prove that solving b-Matching on any pair G, b can be reduced in quasi linear-time to solving this problem on a collection of smaller graphs: that are obtained from the subgraphs in any split decomposition of G by replacing every vertex with a constant-size module. In particular, if G has a split decomposition where all subgraphs have order at most a fixed k, then we can solve b-Matching for G, b in O((k log 2 k)$\times$(m+n)$\times$log ||b|| 1)-time. This answers an open question of [Coudert et al., SODA'18].

Transient Stability Analysis of Grid- Connected Converters with Power Synchronization Control. (arXiv:1804.09394v1 [cs.SY])

The power synchronization control (PSC) has been increasingly used with voltage-source converters (VSCs) connected to the weak grid. This paper presents an in-depth analysis on the transient stability of the PSC-VSC by means of the phase portrait. It is revealed that the PSC-VSC will maintain synchronization with the grid as long as there are equilibrium points after the transient disturbance. However, during the grid faults without any equilibrium points, the critical clearing angle (CCA) for the PSC-VSC is found equal to the power angle at the unstable equilibrium point of the post-fault operation, which is crucial for the power system planning and protection. Moreover, the PSC-VSC can still re-synchronize with the grid after around one cycle of oscillation, even if the fault clearing angle is beyond the CCA. This feature reduces the risk of system collapse caused by the delayed fault clearance. Lastly, these findings are verified by simulations and experimental tests.

Quantitative Susceptibility Map Reconstruction Using Annihilating Filter-based Low-Rank Hankel Matrix Approach. (arXiv:1804.09396v1 [cs.CV])

Quantitative susceptibility mapping (QSM) inevitably suffers from streaking artifacts caused by zeros on the conical surface of the dipole kernel in k-space. This work proposes a novel and accurate QSM reconstruction method based on a direct k-space interpolation approach, avoiding problems of over smoothing and streaking artifacts. Inspired by the recent theory of annihilating filter-based low-rank Hankel matrix approach (ALOHA), QSM reconstruction problem is formulated as deconvolution problem under low-rank Hankel matrix constraint in the k-space. To reduce the computational complexity and the memory requirement, the problem is formulated as successive reconstruction of 2-D planes along three independent axes of the 3-D phase image in Fourier domain. Extensive experiments were performed to verify and compare the proposed method with existing QSM reconstruction methods. The proposed ALOHA-QSM effectively reduced streaking artifacts and accurately estimated susceptibility values in deep gray matter structures, compared to the existing QSM methods. Our suggested ALOHA-QSM algorithm successfully solves the three-dimensional QSM dipole inversion problem without additional anatomical information or prior assumption and provides good image quality and quantitative accuracy.

Learnable Histogram: Statistical Context Features for Deep Neural Networks. (arXiv:1804.09398v1 [cs.CV])

Statistical features, such as histogram, Bag-of-Words (BoW) and Fisher Vector, were commonly used with hand-crafted features in conventional classification methods, but attract less attention since the popularity of deep learning methods. In this paper, we propose a learnable histogram layer, which learns histogram features within deep neural networks in end-to-end training. Such a layer is able to back-propagate (BP) errors, learn optimal bin centers and bin widths, and be jointly optimized with other layers in deep networks during training. Two vision problems, semantic segmentation and object detection, are explored by integrating the learnable histogram layer into deep networks, which show that the proposed layer could be well generalized to different applications. In-depth investigations are conducted to provide insights on the newly introduced layer.

Convolutional Generative Adversarial Networks with Binary Neurons for Polyphonic Music Generation. (arXiv:1804.09399v1 [cs.LG])

It has been shown recently that convolutional generative adversarial networks (GANs) are able to capture the temporal-pitch patterns in music using the piano-roll representation, which represents music by binary-valued time-pitch matrices. However, existing models can only generate real-valued piano-rolls and require further post-processing (e.g. hard thresholding, Bernoulli sampling) at test time to obtain the final binary-valued results. In this work, we first investigate how the real-valued predictions generated by the generator may lead to difficulties in training the discriminator. To overcome the binarization issue, we propose to append to the generator an additional refiner network, which uses binary neurons at the output layer. The whole network can be trained in a two-stage training setting: the generator and the discriminator are pretrained in the first stage; the refiner network is then trained along with the discriminator in the second stage to refine the real-valued piano-rolls generated by the pretrained generator to binary-valued ones. The proposed model is able to directly generate binary-valued piano-rolls at test time. Experimental results show improvements to the existing models in most of the evaluation metrics. All source code, training data and audio samples can be found at https://salu133445.github.io/bmusegan/ .

3D Consistent & Robust Segmentation of Cardiac Images by Deep Learning with Spatial Propagation. (arXiv:1804.09400v1 [cs.CV])

We propose a method based on deep learning to perform cardiac segmentation on short axis MRI image stacks iteratively from the top slice (around the base) to the bottom slice (around the apex). At each iteration, a novel variant of U-net is applied to propagate the segmentation of a slice to the adjacent slice below it. In other words, the prediction of a segmentation of a slice is dependent upon the already existing segmentation of an adjacent slice. 3D-consistency is hence explicitly enforced. The method is trained on a large database of 3078 cases from UK Biobank. It is then tested on 756 different cases from UK Biobank and three other state-of-the-art cohorts (ACDC with 100 cases, Sunnybrook with 30 cases, RVSC with 16 cases). Results comparable or even better than the state-of-the-art in terms of distance measures are achieved. They also emphasize the assets of our method, namely enhanced spatial consistency (currently neither considered nor achieved by the state-of-the-art), and the generalization ability to unseen cases even from other databases.

Generative Temporal Models with Spatial Memory for Partially Observed Environments. (arXiv:1804.09401v1 [stat.ML])

In model-based reinforcement learning, generative and temporal models of environments can be leveraged to boost agent performance, either by tuning the agent's representations during training or via use as part of an explicit planning mechanism. However, their application in practice has been limited to simplistic environments, due to the difficulty of training such models in larger, potentially partially-observed and 3D environments. In this work we introduce a novel action-conditioned generative model of such challenging environments. The model features a non-parametric spatial memory system in which we store learned, disentangled representations of the environment. Low-dimensional spatial updates are computed using a state-space model that makes use of knowledge on the prior dynamics of the moving agent, and high-dimensional visual observations are modelled with a Variational Auto-Encoder. The result is a scalable architecture capable of performing coherent predictions over hundreds of time steps across a range of partially observed 2D and 3D environments.

Probabilistic Plant Modeling via Multi-View Image-to-Image Translation. (arXiv:1804.09404v1 [cs.CV])

This paper describes a method for inferring three-dimensional (3D) plant branch structures that are hidden under leaves from multi-view observations. Unlike previous geometric approaches that heavily rely on the visibility of the branches or use parametric branching models, our method makes statistical inferences of branch structures in a probabilistic framework. By inferring the probability of branch existence using a Bayesian extension of image-to-image translation applied to each of multi-view images, our method generates a probabilistic plant 3D model, which represents the 3D branching pattern that cannot be directly observed. Experiments demonstrate the usefulness of the proposed approach in generating convincing branch structures in comparison to prior approaches.

The use of a pruned modular decomposition for Maximum Matching algorithms on some graph classes. (arXiv:1804.09407v1 [cs.DS])

We address the following general question: given a graph class C on which we can solve Maximum Matching in (quasi) linear time, does the same hold true for the class of graphs that can be modularly decomposed into C ? A major difficulty in this task is that the Maximum Matching problem is not preserved by quotient, thereby making difficult to exploit the structural properties of the quotient subgraphs of the modular decomposition. So far, we are only aware of a recent framework in [Coudert et al., SODA'18] that only applies when the quotient subgraphs have bounded order and/or under additional assumptions on the nontriv-ial modules in the graph. As a first attempt toward improving this framework we study the combined effect of modular decomposition with a pruning process over the quotient subgraphs. More precisely, we remove sequentially from all such subgraphs their so-called one-vertex extensions (i.e., pendant, anti-pendant, twin, universal and isolated vertices). Doing so, we obtain a "pruned modular decomposition", that can be computed in O(m log n)-time. Our main result is that if all the pruned quotient subgraphs have bounded order then a maximum matching can be computed in linear time. This result is mostly based on two pruning rules on pendant and anti-pendant modules -- that are adjacent, respectively, to one or all but one other modules in the graph. Furthermore, these two latter rules are surprisingly intricate and we consider them as our main technical contribution in the paper. We stress that the class of graphs that can be totally decomposed by the pruned modular decomposition contains all the distance-hereditary graphs, and so, it is larger than cographs. In particular, as a byproduct of our approach we also obtain the first known linear-time algorithms for Maximum Matching on distance-hereditary graphs and graphs with modular-treewidth at most one. Finally, we can use an extended version of our framework in order to compute a maximum matching, in linear-time, for all graph classes that can be modularly decomposed into cycles. Our work is the first to explain why the existence of some nice ordering over the modules of a graph, instead of just over its vertices, can help to speed up the computation of maximum matchings on some graph classes.

Two monads for graphs. (arXiv:1804.09408v1 [cs.LO])

An introduction to algebras for graphs, based on Courcelle's algebras of hyperedge replacement and vertex replacement. The paper uses monad notation.

Stable-Matching Voronoi Diagrams: Combinatorial Complexity and Algorithms. (arXiv:1804.09411v1 [cs.CG])

We study algorithms and combinatorial complexity bounds for \emph{stable-matching Voronoi diagrams}, where a set, $S$, of $n$ point sites in the plane determines a stable matching between the points in $\mathbb{R}^2$ and the sites in $S$ such that (i) the points prefer sites closer to them and sites prefer points closer to them, and (ii) each site has a quota or "appetite" indicating the area of the set of points that can be matched to it. Thus, a stable-matching Voronoi diagram is a solution to the well-known post office problem with the added (realistic) constraint that each post office has a limit on the size of its jurisdiction. Previous work on the stable-matching Voronoi diagram provided existence and uniqueness proofs, but did not analyze its combinatorial or algorithmic complexity. In this paper, we show that a stable-matching Voronoi diagram of $n$ point sites has $O(n^{2+\varepsilon})$ faces and edges, for any $\varepsilon>0$, and show that this bound is almost tight by giving a family of diagrams with $\Theta(n^2)$ faces and edges. We also provide a discrete algorithm for constructing it in $O(n^3+n^2f(n))$ time in the real-RAM model of computation, where $f(n)$ is the runtime of a geometric primitive (which we identify) that can be performed in the real-RAM model or can be approximated numerically, but cannot, in general, be performed exactly in an algebraic model of computation.

Movie Question Answering: Remembering the Textual Cues for Layered Visual Contents. (arXiv:1804.09412v1 [cs.CV])

Movies provide us with a mass of visual content as well as attracting stories. Existing methods have illustrated that understanding movie stories through only visual content is still a hard problem. In this paper, for answering questions about movies, we put forward a Layered Memory Network (LMN) that represents frame-level and clip-level movie content by the Static Word Memory module and the Dynamic Subtitle Memory module, respectively. Particularly, we firstly extract words and sentences from the training movie subtitles. Then the hierarchically formed movie representations, which are learned from LMN, not only encode the correspondence between words and visual content inside frames, but also encode the temporal alignment between sentences and frames inside movie clips. We also extend our LMN model into three variant frameworks to illustrate the good extendable capabilities. We conduct extensive experiments on the MovieQA dataset. With only visual content as inputs, LMN with frame-level representation obtains a large performance improvement. When incorporating subtitles into LMN to form the clip-level representation, we achieve the state-of-the-art performance on the online evaluation task of 'Video+Subtitles'. The good performance successfully demonstrates that the proposed framework of LMN is effective and the hierarchically formed movie representations have good potential for the applications of movie question answering.

The Dispersion of the Gauss-Markov Source. (arXiv:1804.09418v1 [cs.IT])

The Gauss-Markov source produces $U_i = aU_{i-1} + Z_i$ for $i\geq 1$, where $U_0 = 0$, $|a|<1$ and $Z_i\sim\mathcal{N}(0, \sigma^2)$ are i.i.d. Gaussian random variables. We consider lossy compression of a block of $n$ samples of the Gauss-Markov source under squared error distortion. We obtain the Gaussian approximation for the Gauss-Markov source with excess-distortion criterion for any distortion $d>0$, and we show that the dispersion has a reverse waterfilling representation. This is the \emph{first} finite blocklength result for lossy compression of \emph{sources with memory}. We prove that the finite blocklength rate-distortion function $R(n,d,\epsilon)$ approaches the rate-distortion function $\mathbb{R}(d)$ as $R(n,d,\epsilon) = \mathbb{R}(d) + \sqrt{\frac{V(d)}{n}}Q^{-1}(\epsilon) + o\left(\frac{1}{\sqrt{n}}\right)$, where $V(d)$ is the dispersion, $\epsilon \in (0,1)$ is the excess-distortion probability, and $Q^{-1}$ is the inverse of the $Q$-function. We give a reverse waterfilling integral representation for the dispersion $V(d)$, which parallels that of the rate-distortion functions for Gaussian processes. Remarkably, for all $0 < d\leq \frac{\sigma^2}{(1+|a|)^2}$, $R(n,d,\epsilon)$ of the Gauss-Markov source coincides with that of $Z_k$, the i.i.d. Gaussian noise driving the process, up to the second-order term. Among novel technical tools developed in this paper is a sharp approximation of the eigenvalues of the covariance matrix of $n$ samples of the Gauss-Markov source, and a construction of a typical set using the maximum likelihood estimate of the parameter $a$ based on $n$ observations.

Incremental Optimization of Independent Sets under Reachability Constraints. (arXiv:1804.09422v1 [cs.DM])

We introduce a new framework for reconfiguration problems, and apply it to independent sets as the first example. Suppose that we are given an independent set $I_0$ of a graph $G$, and an integer $l \ge 0$ which represents a lower bound on the size of any independent set of $G$. Then, we are asked to find an independent set of $G$ having the maximum size among independent sets that are reachable from $I_0$ by either adding or removing a single vertex at a time such that all intermediate independent sets are of size at least $l$. We show that this problem is PSPACE-hard even for bounded pathwidth graphs, and remains NP-hard for planar graphs. On the other hand, we give a linear-time algorithm to solve the problem for chordal graphs. We also study the fixed-parameter (in)tractability of the problem with respect to the following three parameters: the degeneracy $d$ of an input graph, a lower bound $l$ on the size of the independent sets, and a lower bound $s$ on the solution size. We show that the problem is fixed-parameter intractable when only one of $d$, $l$, and $s$ is taken as a parameter. On the other hand, we give a fixed-parameter algorithm when parameterized by $s+d$; this result implies that the problem parameterized only by $s$ is fixed-parameter tractable for planar graphs, and for bounded treewidth graphs.

Compositional Equivalence with actor attributes: Positional analysis of the the Florentine Families network. (arXiv:1804.09427v1 [cs.SI])

This paper extends Compositional Equivalence ---which is a structural correspondence type aimed for multiplex networks--- by incorporating actor attributes in the modelling of the network relational structure as diagonal matrices. As an illustration, we construct the positional system of the Florentine families' network in the 15th century with Business and Marriage ties together with relevant characteristics acquired from the actors such as families' financial Wealth and their number of Priorates. Different representations of the cumulated person hierarchies reveal that adding Wealth in the modelling provides a more accurate picture of what the substantial narrative says about this network.

Weakly-Supervised Learning-Based Feature Localization in Confocal Laser Endomicroscopy Glioma Images. (arXiv:1804.09428v1 [cs.CV])

Confocal Laser Endomicroscope (CLE) is a novel handheld fluorescence imaging device that has shown promise for rapid intraoperative diagnosis of brain tumor tissue. Currently CLE is capable of image display only and lacks an automatic system to aid the surgeon in analyzing the images. The goal of this project was to develop a computer-aided diagnostic approach for CLE imaging of human glioma with feature localization function. Despite the tremendous progress in object detection and image segmentation methods in recent years, most of such methods require large annotated datasets for training. However, manual annotation of thousands of histopathological images by physicians is costly and time consuming. To overcome this problem, we propose a Weakly-Supervised Learning (WSL)-based model for feature localization that trains on image-level annotations, and then localizes incidences of a class-of-interest in the test image. We developed a novel convolutional neural network for diagnostic features localization from CLE images by employing a novel multiscale activation map that is laterally inhibited and collaterally integrated. To validate our method, we compared proposed model's output to the manual annotation performed by four neurosurgeons on test images. Proposed model achieved 88% mean accuracy and 86% mean intersection over union on intermediate features and 87% mean accuracy and 88% mean intersection over union on restrictive fine features, while outperforming other state of the art methods tested. This system can improve accuracy and efficiency in characterization of CLE images of glioma tissue during surgery, augment intraoperative decision-making process regarding tumor margin and affect resection rates.

Degree based Topological indices of Hanoi Graph. (arXiv:1804.09431v1 [cs.DM])

There are various topological indices for example distance based topological indices and degree based topological indices etc. In QSAR/QSPR study, physiochemical properties and topological indices for example atom bond connectivity index, fourth atom bond connectivity index, Randic connectivity index, sum connectivity index, and so forth are used to characterize the chemical compound. In this paper we computed the edge version of atom bond connectivity index, fourth atom bond connectivity index, Randic connectivity index, sum connectivity index, geometric-arithmetic index and fifth geometric-arithmetic index of Double-wheel graph and Hanoi graph. The results are analyzed and the general formulas are derived for the above mentioned families of graphs.

Der Trusted Connector im Industrial Data Space. (arXiv:1804.09442v1 [cs.CR])

Digitalization affects all industrial domains and causes disruption of various business models. Especially in domains such as logistics and manufacturing, inter-connected devices and near-realtime exchange of sensor data across enterprises allows to speed up processes, reduce costs and respond to customer's needs. However, the advent of the Industrial Internet of Things (IIoT) also raises challenges with respect to security and privacy of sensitive and personal data that is created by sensors and processed by services hosted in different administrative domains. The Industrial Data Space initiative addresses these challenges and proposes a secure edge gateway platform named "Trusted Connector". In this report, we introduce the main security building blocks of the Trusted Connector and point out how they help protecting business-critical data and preserving the user's privacy.

On the satisfiability problem for fragments of the two-variable logic with one transitive relation. (arXiv:1804.09447v1 [cs.LO])

We study the satisfiability problem for the two-variable first-order logic over structures with one transitive relation. % We show that the problem is decidable in 2-NExpTime for the fragment consisting of formulas where existential quantifiers are guarded by transitive atoms. As this fragment enjoys neither the finite model nor the tree model property, to show decidability we introduce novel model construction technique based on the infinite Ramsey theorem.

We also point out why the technique is not sufficient to obtain decidability for the complimentary fragment where existential quantifiers are 'guarded' by negated transitive atoms, i.e. the existential subformulas are of the form $\exists y(\neg Txy\wedge \neg Tyx \wedge \phi(x,y))$, where $T$ denotes the transitive relation. Hence, contrary to our previous claim, [31], the status of the satisfiability problem for the full two-variable logic with one transitive relation remains open.

Extensor-Coding. (arXiv:1804.09448v1 [cs.DS])

We devise an algorithm that approximately computes the number of paths of length $k$ in a given directed graph with $n$ vertices up to a multiplicative error of $1 \pm \varepsilon$. Our algorithm runs in time $\varepsilon^{-2} 4^k(n+m) \operatorname{poly}(k)$. The algorithm is based on associating with each vertex an element in the exterior (or, Grassmann) algebra, called an extensor, and then performing computations in this algebra. This connection to exterior algebra generalizes a number of previous approaches for the longest path problem and is of independent conceptual interest. Using this approach, we also obtain a deterministic $2^{k}\cdot\operatorname{poly}(n)$ time algorithm to find a $k$-path in a given directed graph that is promised to have few of them. Our results and techniques generalize to the subgraph isomorphism problem when the subgraphs we are looking for have bounded pathwidth. Finally, we also obtain a randomized algorithm to detect $k$-multilinear terms in a multivariate polynomial given as a general algebraic circuit. To the best of our knowledge, this was previously only known for algebraic circuits not involving negative constants.

Normal edge-colorings of cubic graphs. (arXiv:1804.09449v1 [cs.DM])

A normal $k$-edge-coloring of a cubic graph is an edge-coloring with $k$ colors having the additional property that when looking at the set of colors assigned to any edge $e$ and the four edges adjacent it, we have either exactly five distinct colors or exactly three distinct colors. We denote by $\chi'_{N}(G)$ the smallest $k$, for which $G$ admits a normal $k$-edge-coloring. Normal $k$-edge-colorings were introduced by Jaeger in order to study his well-known Petersen Coloring Conjecture. More precisely, it is known that proving $\chi'_{N}(G)\leq 5$ for every bridgeless cubic graph is equivalent to proving Petersen Coloring Conjecture and then, among others, Cycle Double Cover Conjecture and Berge-Fulkerson Conjecture. Considering the larger class of all simple cubic graphs (not necessarily bridgeless), some interesting questions naturally arise. For instance, there exist simple cubic graphs, not bridgeless, with $\chi'_{N}(G)=7$. On the other hand, the known best general upper bound for $\chi'_{N}(G)$ was $9$. Here, we improve it by proving that $\chi'_{N}(G)\leq7$ for any simple cubic graph $G$, which is best possible. We obtain this result by proving the existence of specific no-where zero $\mathbb{Z}_2^2$-flows in $4$-edge-connected graphs.

Throughput Analysis for Relay-Assisted Millimeter-Wave Wireless Networks. (arXiv:1804.09450v1 [cs.NI])

In this work, we analyze the throughput of random access multi-user relay-assisted millimeter-wave wireless networks, in which both the destination and the relay have multipacket reception capability. We consider a full-duplex network cooperative relay that stores the successfully received packets in a queue, for which we analyze the performance. Moreover, we study the effects on the network throughput of two different strategies, by which the source nodes transmit either a packet to both the destination and the relay in the same timeslot by using wider beams (broadcast approach) or to only one of these two by using narrower beams (fully directional approach). We consider the inter-beam interference at the receiver and show the optimal strategy with respect to several system parameters, e.g., positions and number of the nodes.

Multi-modal Approach for Affective Computing. (arXiv:1804.09452v1 [cs.HC])

Throughout the past decade, many studies have classified human emotions using only a single sensing modality such as face video, electroencephalogram (EEG), electrocardiogram (ECG), galvanic skin response (GSR), etc. The results of these studies are constrained by the limitations of these modalities such as the absence of physiological biomarkers in the face-video analysis, poor spatial resolution in EEG, poor temporal resolution of the GSR etc. Scant research has been conducted to compare the merits of these modalities and understand how to best use them individually and jointly. Using multi-modal AMIGOS dataset, this study compares the performance of human emotion classification using multiple computational approaches applied to face videos and various bio-sensing modalities. Using a novel method for compensating physiological baseline we show an increase in the classification accuracy of various approaches that we use. Finally, we present a multi-modal emotion-classification approach in the domain of affective computing research.

Dynamic Few-Shot Visual Learning without Forgetting. (arXiv:1804.09458v1 [cs.CV])

The human visual system has the remarkably ability to be able to effortlessly learn novel concepts from only a few examples. Mimicking the same behavior on machine learning vision systems is an interesting and very challenging research problem with many practical advantages on real world vision applications. In this context, the goal of our work is to devise a few-shot visual learning system that during test time it will be able to efficiently learn novel categories from only a few training data while at the same time it will not forget the initial categories on which it was trained (here called base categories). To achieve that goal we propose (a) to extend an object recognition system with an attention based few-shot classification weight generator, and (b) to redesign the classifier of a ConvNet model as the cosine similarity function between feature representations and classification weight vectors. The latter, apart from unifying the recognition of both novel and base categories, it also leads to feature representations that generalize better on "unseen" categories. We extensively evaluate our approach on Mini-ImageNet where we manage to improve the prior state-of-the-art on few-shot recognition (i.e., we achieve 56.20% and 73.00% on the 1-shot and 5-shot settings respectively) while at the same time we do not sacrifice any accuracy on the base categories, which is a characteristic that most prior approaches lack. Finally, we apply our approach on the recently introduced few-shot benchmark of Bharath and Girshick [4] where we also achieve state-of-the-art results. The code and models of our paper will be published on: https://github.com/gidariss/FewShotWithoutForgetting

Analytical Modeling of Vanishing Points and Curves in Catadioptric Cameras. (arXiv:1804.09460v1 [cs.CV])

Vanishing points and vanishing lines are classical geometrical concepts in perspective cameras that have a lineage dating back to 3 centuries. A vanishing point is a point on the image plane where parallel lines in 3D space appear to converge, whereas a vanishing line passes through 2 or more vanishing points. While such concepts are simple and intuitive in perspective cameras, their counterparts in catadioptric cameras (obtained using mirrors and lenses) are more involved. For example, lines in the 3D space map to higher degree curves in catadioptric cameras. The projection of a set of 3D parallel lines converges on a single point in perspective images, whereas they converge to more than one point in catadioptric cameras. To the best of our knowledge, we are not aware of any systematic development of analytical models for vanishing points and vanishing curves in different types of catadioptric cameras. In this paper, we derive parametric equations for vanishing points and vanishing curves using the calibration parameters, mirror shape coefficients, and direction vectors of parallel lines in 3D space. We show compelling experimental results on vanishing point estimation and absolute pose estimation for a wide range of catadioptric cameras in both simulations and real experiments.

Structured Deep Neural Network Pruning by Varying Regularization Parameters. (arXiv:1804.09461v1 [cs.LG])

Convolutional Neural Networks (CNN's) are restricted by their massive computation and high storage. Parameter pruning is a promising approach for CNN compression and acceleration, which aims at eliminating redundant model parameters with tolerable performance loss. Despite its effectiveness, existing regularization-based parameter pruning methods usually assign a fixed regularization parameter to all weights, which neglects the fact that different weights may have different importance to CNN. To solve this problem, we propose a theoretically sound regularization-based pruning method to incrementally assign different regularization parameters to different weights based on their importance to the network. On AlexNet and VGG-16, our method can achieve 4x theoretical speedup with similar accuracies compared with the baselines. For ResNet-50, the proposed method also achieves 2x acceleration and only suffers 0.1% top-5 accuracy loss.

Optimized Resource Provisioning and Operation Control for Low-power Wide-area IoT Networks. (arXiv:1804.09464v1 [cs.IT])

The tradeoff between cost of the access network and quality of offered service to IoT devices, in terms of reliability and durability of communications, is investigated. We first develop analytical tools for reliability evaluation in uplink-oriented large-scale IoT networks. These tools comprise modeling of interference from heterogeneous interfering sources with time-frequency asynchronous radio-resource usage patterns, and benefit from realistic distribution processes for modeling channel fading and locations of interfering sources. We further present a cost model for the access network as a function of provisioned resources like density of access points (APs), and a battery lifetime model for IoT devices as a function of intrinsic parameters, e.g. level of stored energy, and network parameters, e.g. reliability of communication. The derived models represent the ways in which a required level of reliability can be achieved by either sacrificing battery lifetime (durability), e.g. increasing number of replica transmissions, or sacrificing network cost and increasing provisioned resources, e.g. density of the APs. Then, we investigate optimal resource provisioning and operation control strategies, where the former aims at finding the optimized investment in the access network based on the cost of each resource; while the latter aims at optimizing data transmission strategies of IoT devices. The simulation results confirm tightness of derived analytical expressions, and show how the derived expressions can be used in finding optimal operation points of IoT networks.

Intelligent Physiotherapy Through Procedural Content Generation. (arXiv:1804.09465v1 [cs.AI])

This paper describes an avenue for artificial and computational intelligence techniques applied within games research to be deployed for purposes of physical therapy. We provide an overview of prototypical research focussed on the application of motion sensor input devices and virtual reality equipment for rehabilitation of motor impairment an issue typical of patient's of traumatic brain injuries. We highlight how advances in procedural content generation and player modelling can stimulate development in this area by improving quality of rehabilitation programmes and measuring patient performance.

Zigzag Learning for Weakly Supervised Object Detection. (arXiv:1804.09466v1 [cs.CV])

This paper addresses weakly supervised object detection with only image-level supervision at training stage. Previous approaches train detection models with entire images all at once, making the models prone to being trapped in sub-optimums due to the introduced false positive examples. Unlike them, we propose a zigzag learning strategy to simultaneously discover reliable object instances and prevent the model from overfitting initial seeds. Towards this goal, we first develop a criterion named mean Energy Accumulation Scores (mEAS) to automatically measure and rank localization difficulty of an image containing the target object, and accordingly learn the detector progressively by feeding examples with increasing difficulty. In this way, the model can be well prepared by training on easy examples for learning from more difficult ones and thus gain a stronger detection ability more efficiently. Furthermore, we introduce a novel masking regularization strategy over the high level convolutional feature maps to avoid overfitting initial samples. These two modules formulate a zigzag learning process, where progressive learning endeavors to discover reliable object instances, and masking regularization increases the difficulty of finding object instances properly. We achieve 47.6% mAP on PASCAL VOC 2007, surpassing the state-of-the-arts by a large margin.

Price of Anarchy for Mechanisms with Risk-Averse Agents. (arXiv:1804.09468v1 [cs.GT])

We study the price of anarchy of mechanisms in the presence of risk-averse agents. Previous work has focused on agents with quasilinear utilities, possibly with a budget. Our model subsumes this as a special case but also captures that agents might be less sensitive to payments than in the risk-neutral model. We show that many positive price-of-anarchy results proved in the smoothness framework continue to hold in the more general risk-averse setting. A sufficient condition is that agents can never end up with negative quasilinear utility after playing an undominated strategy. This is true, e.g., for first-price and second-price auctions. For all-pay auctions, similar results do not hold: We show that there are Bayes-Nash equilibria with arbitrarily bad social welfare compared to the optimum.

Stratified Negation in Limit Datalog Programs. (arXiv:1804.09473v1 [cs.AI])

There has recently been an increasing interest in declarative data analysis, where analytic tasks are specified using a logical language, and their implementation and optimisation are delegated to a general-purpose query engine. Existing declarative languages for data analysis can be formalised as variants of logic programming equipped with arithmetic function symbols and/or aggregation, and are typically undecidable. In prior work, the language of $\mathit{limit\ programs}$ was proposed, which is sufficiently powerful to capture many analysis tasks and has decidable entailment problem. Rules in this language, however, do not allow for negation. In this paper, we study an extension of limit programs with stratified negation-as-failure. We show that the additional expressive power makes reasoning computationally more demanding, and provide tight data complexity bounds. We also identify a fragment with tractable data complexity and sufficient expressivity to capture many relevant tasks.

Bribery Games on Interdependent Complex Networks. (arXiv:1804.09477v1 [physics.soc-ph])

Bribe demands present a social conflict scenario where decisions have wide-ranging economic and ethical consequences. Nevertheless, such incidents occur daily in many countries across the globe. Harassment bribery constitute a significant sub-set of such bribery incidents where a government official demands a bribe for providing a service to a citizen legally entitled to it. We employ an evolutionary game-theoretic framework to analyse the evolution of corrupt and honest strategies in structured populations characterized by an interdependent complex network. The effects of changing network topology, average number of links and asymmetry in size of the citizen and officer population on the proliferation of incidents of bribery are explored. A complex network topology is found to be beneficial for the dominance of corrupt strategies over a larger region of phase space when compared with the outcome for a regular network, for equal citizen and officer population sizes. However, the extent of the advantage depends critically on the network degree and topology. A different trend is observed when there is a difference between the citizen and officer population sizes. Under those circumstances, increasing randomness of the underlying citizen network can be beneficial to the fixation of honest officers up to a certain value of the network degree. Our analysis reveals how the interplay between network topology, connectivity and strategy update rules can affect population level outcomes in such asymmetric games.

Coverage of highly-cited documents in Google Scholar, Web of Science, and Scopus: a multidisciplinary comparison. (arXiv:1804.09479v1 [cs.DL])
This study explores the extent to which bibliometric indicators based on counts of highly-cited documents could be affected by the choice of data source. The initial hypothesis is that databases that rely on journal selection criteria for their document coverage may not necessarily provide an accurate representation of highly-cited documents across all subject areas, while inclusive databases, which give each document the chance to stand on its own merits, might be better suited to identify highly-cited documents. To test this hypothesis, an analysis of 2,515 highly-cited documents published in 2006 that Google Scholar displays in its Classic Papers product is carried out at the level of broad subject categories, checking whether these documents are also covered in Web of Science and Scopus, and whether the citation counts offered by the different sources are similar. The results show that a large fraction of highly-cited documents in the Social Sciences and Humanities (8.6%-28.2%) are invisible to Web of Science and Scopus. In the Natural, Life, and Health Sciences the proportion of missing highly-cited documents in Web of Science and Scopus is much lower. Furthermore, in all areas, Spearman correlation coefficients of citation counts in Google Scholar, as compared to Web of Science and Scopus citation counts, are remarkably strong (.83-.99). The main conclusion is that the data about highly-cited documents available in the inclusive database Google Scholar does indeed reveal significant coverage deficiencies in Web of Science and Scopus in several areas of research. Therefore, using these selective databases to compute bibliometric indicators based on counts of highly-cited documents might produce biased assessments[...]

Information diffusion backbones in temporal networks. (arXiv:1804.09483v1 [physics.soc-ph])

Much effort has been devoted to understand how temporal network features and the choice of the source node affect the prevalence of a diffusion process. In this work, we addressed the further question: node pairs with what kind of local and temporal connection features tend to appear in a diffusion trajectory or path, thus contribute to the actual information diffusion. We consider the Susceptible-Infected spreading process with a given infection probability per contact on a large number of real-world temporal networks. We illustrate how to construct the information diffusion backbone where the weight of each link tells the probability that a node pair appears in a diffusion process starting from a random node. We unravel how these backbones corresponding to different infection probabilities relate to each other and point out the importance of two extreme backbones: the backbone with infection probability one and the integrated network, between which other backbones vary. We find that the temporal node pair feature that we proposed could better predict the links in the extreme backbone with infection probability one as well as the high weight links than the features derived from the integrated network. This universal finding across all the empirical networks highlights that temporal information are crucial in determining a node pair's role in a diffusion process. A node pair with many early contacts tends to appear in a diffusion process. Our findings shed lights on the in-depth understanding and may inspire the control of information spread.

RIFT: Multi-modal Image Matching Based on Radiation-invariant Feature Transform. (arXiv:1804.09493v1 [cs.CV])
Traditional feature matching methods such as scale-invariant feature transform (SIFT) usually use image intensity or gradient information to detect and describe feature points; however, both intensity and gradient are sensitive to nonlinear radiation distortions (NRD). To solve the problem, this paper proposes a novel feature matching algorithm that is robust to large NRD. The proposed method is called radiation-invariant feature transform (RIFT). There are three main contributions in RIFT: first, RIFT uses phase congruency (PC) instead of image intensity for feature point detection. RIFT considers both the number and repeatability of feature points, and detects both corner points and edge points on the PC map. Second, RIFT originally proposes a maximum index map (MIM) for feature description. MIM is constructed from the log-Gabor convolution sequence and is much more robust to NRD than traditional gradient map. Thus, RIFT not only largely improves the stability of feature detection, but also overcomes the limitation of gradient information for feature description. Third, RIFT analyzes the inherent influence of rotations on the values of MIM, and realizes rotation invariance. We use six different types of multi-model image datasets to evaluate RIFT, including optical-optical, infrared-optical, synthetic aperture radar (SAR)-optical, depth-optical, map-optical, and day-night datasets. Experimental results show that RIFT is much more superior to SIFT and SAR-SIFT. To the best of our knowledge, RIFT is the first feature matching algorithm that can achieve good performance on all the above-mentioned types of multi-model images. The source code of RIFT and multi-modal r[...]

On Optimizing Distributed Tucker Decomposition for Sparse Tensors. (arXiv:1804.09494v1 [cs.DC])
The Tucker decomposition generalizes the notion of Singular Value Decomposition (SVD) to tensors, the higher dimensional analogues of matrices. We study the problem of constructing the Tucker decomposition of sparse tensors on distributed memory systems via the HOOI procedure, a popular iterative method. The scheme used for distributing the input tensor among the processors (MPI ranks) critically influences the HOOI execution time. Prior work has proposed different distribution schemes: an offline scheme based on sophisticated hypergraph partitioning method and simple, lightweight alternatives that can be used real-time. While the hypergraph based scheme typically results in faster HOOI execution time, being complex, the time taken for determining the distribution is an order of magnitude higher than the execution time of a single HOOI iteration. Our main contribution is a lightweight distribution scheme, which achieves the best of both worlds. We show that the scheme is near-optimal on certain fundamental metrics associated with the HOOI procedure and as a result, near-optimal on the computational load (FLOPs). Though the scheme may incur higher communication volume, the computation time is the dominant factor and as the result, the scheme achieves better performance on the overall HOOI execution time. Our experimental evaluation on large real-life tensors (having up to 4 billion elements) shows that the scheme outperforms the prior schemes on the HOOI execution time by a factor of up to 3x. On the other hand, its distribution time is comparable to the prior lightweight schemes and is typically lesser than the execution time of a single[...]

Estimation with Low-Rank Time-Frequency Synthesis Models. (arXiv:1804.09497v1 [eess.SP])

Many state-of-the-art signal decomposition techniques rely on a low-rank factorization of a time-frequency (t-f) transform. In particular, nonnegative matrix factorization (NMF) of the spectrogram has been considered in many audio applications. This is an analysis approach in the sense that the factorization is applied to the squared magnitude of the analysis coefficients returned by the t-f transform. In this paper we instead propose a synthesis approach, where low-rankness is imposed to the synthesis coefficients of the data signal over a given t-f dictionary (such as a Gabor frame). As such we offer a novel modeling paradigm that bridges t-f synthesis modeling and traditional analysis-based NMF approaches. The proposed generative model allows in turn to design more sophisticated multi-layer representations that can efficiently capture diverse forms of structure. Additionally, the generative modeling allows to exploit t-f low-rankness for compressive sensing. We present efficient iterative shrinkage algorithms to perform estimation in the proposed models and illustrate the capabilities of the new modeling paradigm over audio signal processing examples.

Fault Detection and Isolation of Satellite Gyroscopes Using Relative Positions in Formation Flying. (arXiv:1804.09498v1 [math.OC])

A fault detection and isolation method for satellite rate gyros is proposed based on using the satellite-to-satellite measurements such as relative position beside orbit parameters of the primary satellite. By finding a constant of motion, it is shown that the dynamic states in a relative motion are restricted in such a way that the angular velocity vector of primary satellite lies on a quadratic surface. This constant of motion is then used to detect the gyroscope faults and estimate the corresponding scale factor or bias values of the rate gyros of the primary satellite. The proposed algorithm works even in time variant fault situations as well, and does not impose any additional subsystems to formation flying satellites. Monte-Carlo simulations are used to ensure that the algorithm retains its performance in the presence of uncertainties. In presence of only measurement noise, the isolation process performs well by selecting a proper threshold. However, the isolation performance degrades as the scale factor approaches unity or bias approaches zero. Finally, the effect of orbital perturbations on isolation process is investigated by including the effect of zonal harmonics as well as drag and without loss of generality, it is shown that the perturbation effects are negligible.

Unsupervised Disentangled Representation Learning with Analogical Relations. (arXiv:1804.09502v1 [cs.LG])

Learning the disentangled representation of interpretable generative factors of data is one of the foundations to allow artificial intelligence to think like people. In this paper, we propose the analogical training strategy for the unsupervised disentangled representation learning in generative models. The analogy is one of the typical cognitive processes, and our proposed strategy is based on the observation that sample pairs in which one is different from the other in one specific generative factor show the same analogical relation. Thus, the generator is trained to generate sample pairs from which a designed classifier can identify the underlying analogical relation. In addition, we propose a disentanglement metric called the subspace score, which is inspired by subspace learning methods and does not require supervised information. Experiments show that our proposed training strategy allows the generative models to find the disentangled factors, and that our methods can give competitive performances as compared with the state-of-the-art methods.

Generalized Fast Decoding of Polar Codes. (arXiv:1804.09508v1 [cs.IT])

Research on polar codes has been constantly gaining attention over the last decade, by academia and industry alike, thanks to their capacity-achieving error-correction performance and low-complexity decoding algorithms. Recently, they have been selected as one of the coding schemes in the $5^{th}$ generation wireless standard (5G). Over the years various polar code decoding algorithms, like SC-list (SCL), have been proposed to improve the mediocre performance of the successive cancellation (SC) decoding algorithm for finite code lengths; however, like SC, they suffer from long decoding latency. Fast decoding of polar codes tries to overcome this problem by identifying particular subcodes in the polar code and decoding them with efficient decoders. In this work, we introduce a generalized approach to fast decoding of polar codes to further reduce SC-based decoding latency. We propose three multi-node polar code subcodes whose identification patterns include most of the existing subcodes, extending them to SCL decoding, and allow to apply fast decoding to larger subsets of bits. Without any error-correction performance degradation, the proposed technique shows up to $23.6\%$ and $29.2\%$ decoding latency gain with respect to fast SC and SCL decoding algorithms, respectively, and up to $63.6\%$ and $49.8\%$ if a performance loss is accepted, whose amount depends on code and decoding algorithm parameters, along with the desired speedup.

Fair Division Under Cardinality Constraints. (arXiv:1804.09521v1 [cs.GT])

We consider the problem of fairly allocating indivisible goods, among agents, under cardinality constraints and additive valuations. In this setting, we are given a partition of the entire set of goods---i.e., the goods are categorized---and a limit is specified on the number of goods that can be allocated from each category to any agent. The objective here is to find a fair allocation in which the subset of goods assigned to any agent satisfies the given cardinality constraints. This problem naturally captures a number of resource-allocation applications, and is a generalization of the well-studied (unconstrained) fair division problem.

The two central notions of fairness, in the context of fair division of indivisible goods, are envy freeness up to one good (EF1) and the (approximate) maximin share guarantee (MMS). We show that the existence and algorithmic guarantees established for these solution concepts in the unconstrained setting can essentially be achieved under cardinality constraints. Specifically, we develop efficient algorithms which compute EF1 and approximately MMS allocations in the constrained setting.

Furthermore, focusing on the case wherein all the agents have the same additive valuation, we establish that EF1 locations exist even under matroid constraints.

Quantum conditional relative entropy and quasi-factorization of the relative entropy. (arXiv:1804.09525v1 [quant-ph])

The existence of a positive log-Sobolev constant implies a bound on the mixing time of a quantum dissipative evolution under the Markov approximation. For classical spin systems, such constant was proven to exist, under the assumption of a mixing condition in the Gibbs measure associated to their dynamics, via a quasi-factorization of the entropy in terms of the conditional entropy in some sub-$\sigma$-algebras.

In this work we analyze analogous quasi-factorization results in the quantum case. For that, we define the quantum conditional relative entropy and prove several quasi-factorization results for it. As an illustration of their potential, we use one of them to obtain a positive log-Sobolev constant for the heat-bath dynamics with product fixed point.

Strong Baselines for Neural Semi-supervised Learning under Domain Shift. (arXiv:1804.09530v1 [cs.CL])

Novel neural models have been proposed in recent years for learning under domain shift. Most models, however, only evaluate on a single task, on proprietary datasets, or compare to weak baselines, which makes comparison of models difficult. In this paper, we re-evaluate classic general-purpose bootstrapping approaches in the context of neural networks under domain shifts vs. recent neural approaches and propose a novel multi-task tri-training method that reduces the time and space complexity of classic tri-training. Extensive experiments on two benchmarks are negative: while our novel method establishes a new state-of-the-art for sentiment analysis, it does not fare consistently the best. More importantly, we arrive at the somewhat surprising conclusion that classic tri-training, with some additions, outperforms the state of the art. We conclude that classic approaches constitute an important and strong baseline.

Hand Pose Estimation via Latent 2.5D Heatmap Regression. (arXiv:1804.09534v1 [cs.CV])

Estimating the 3D pose of a hand is an essential part of human-computer interaction. Estimating 3D pose using depth or multi-view sensors has become easier with recent advances in computer vision, however, regressing pose from a single RGB image is much less straightforward. The main difficulty arises from the fact that 3D pose requires some form of depth estimates, which are ambiguous given only an RGB image. In this paper we propose a new method for 3D hand pose estimation from a monocular image through a novel 2.5D pose representation. Our new representation estimates pose up to a scaling factor, which can be estimated additionally if a prior of the hand size is given. We implicitly learn depth maps and heatmap distributions with a novel CNN architecture. Our system achieves the state-of-the-art estimation of 2D and 3D hand pose on several challenging datasets in presence of severe occlusions.

Deep Convolutional AutoEncoder-based Lossy Image Compression. (arXiv:1804.09535v1 [cs.CV])

Image compression has been investigated as a fundamental research topic for many decades. Recently, deep learning has achieved great success in many computer vision tasks, and is gradually being used in image compression. In this paper, we present a lossy image compression architecture, which utilizes the advantages of convolutional autoencoder (CAE) to achieve a high coding efficiency. First, we design a novel CAE architecture to replace the conventional transforms and train this CAE using a rate-distortion loss function. Second, to generate a more energy-compact representation, we utilize the principal components analysis (PCA) to rotate the feature maps produced by the CAE, and then apply the quantization and entropy coder to generate the codes. Experimental results demonstrate that our method outperforms traditional image coding algorithms, by achieving a 13.7% BD-rate decrement on the Kodak database images compared to JPEG2000. Besides, our method maintains a moderate complexity similar to JPEG2000.

Fast parallel multidimensional FFT using advanced MPI. (arXiv:1804.09536v1 [cs.DC])
We present a new method for performing global redistributions of multidimensional arrays essential to parallel fast Fourier (or similar) transforms. Traditional methods use standard all-to-all collective communication of contiguous memory buffers, thus necessary requiring local data realignment steps intermixed in-between redistribution and transform steps. Instead, our method takes advantage of subarray datatypes and generalized all-to-all scatter/gather from the MPI-2 standard to communicate discontiguous memory buffers, effectively eliminating the need for local data realignments. Despite generalized all-to-all communication of discontiguous data being generally slower, our proposal economizes in local work. For a range of strong and weak scaling tests, we found the overall performance of our method to be on par and often better than well-established libraries like MPI-FFTW, P3DFFT, and 2DECOMP&FFT. We provide compact routines implemented at the highest possible level using the MPI bindings for the C programming language. These routines apply to any global redistribution, over any two directions of a multidimensional array, decomposed on arbitrary Cartesian processor grids (1D slabs, 2D pencils, or even higher-dimensional decompositions). The high level implementation makes the code easy to read, maintain, and eventually extend. Our approach enables for future sp[...]