Blogroll

Author2Vec: Learning Author Representations by Combining Content and Link Information

Microsoft Research Publications - Fri, 04/01/2016 - 09:00
In this paper, we consider the problem of learning representations for authors from bibliographic co-authorship networks. Existing methods for deep learning on graphs, such as DeepWalk, suffer from link sparsity problem as they focus on modeling the link information only. We hypothesize that capturing both the content and link information in a unified way will help mitigate the sparsity problem. To this end, we present a novel model `Author2Vec' 1, which learns lowdimensional author representations such that authors who write similar content and share similar network structure are closer in vector space. Such embeddings are useful in a variety of applications such as link prediction, node classification, recommendation and visualization. The author embeddings we learn are empirically shown to outperform DeepWalk by 2.35% and 0.83% for link prediction and clustering task respectively.
Categories: Microsoft

Predicting Post-Operative Visual Acuity for LASIK Surgeries

Microsoft Research Publications - Fri, 04/01/2016 - 09:00
LASIK (Laser-Assisted in SItu Keratomileusis) surgeries have been quite popular for treatment of myopia (nearsightedness), hyperopia (farsightedness) and astigmatism over the past two decades. In the past decade, over 10 million LASIK procedures had been performed in the United States alone with an average cost of approximately $2000 USD per surgery. While 99% of such surgeries are successful, the commonest side effect is a residual refractive error and poor uncorrected visual acuity (UCVA). In this work, we aim at predicting the UCVA post LASIK surgery. We model the task as a regression problem and use the patient demography and pre-operative examination details as features. To the best of our knowledge, this is the first work to systematically explore this critical problem using machine learning methods. Further, LASIK surgery settings are often determined by practitioners using manually designed rules. We explore the possibility of determining such settings automatically to optimize for the best post-operative UCVA by including such settings as features in our regression model. Our experiments on a dataset of 791 surgeries provides an RMSE (root mean square error) of 0.102, 0.094 and 0.074 for the predicted post-operative UCVA after one day, one week and one month of the surgery respectively.
Categories: Microsoft

High-Density Image Storage Using Approximate Memory Cells

Microsoft Research Publications - Fri, 04/01/2016 - 09:00
This paper proposes tailoring image encoding for an approximate storage substrate, developing an approximation-aware encoding algorithm. We develop a methodology to determine relative importance of encoded bits and store them in an approximate storage substrate that we tune to match their error tolerance. We present a case study with the progressive transform codec (PTC), a precursor to JPEG XR, and a phase-change memory (PCM) storage substrate that is optimized to minimize errors via biasing and tuned via selective error correction to different error rate levels. This enables effective use of approximate storage for images, resulting in over 2.7x increase in density of pixels per silicon volume that is additive to PTC storage savings.
Categories: Microsoft

A DNA-Based Archival Storage System

Microsoft Research Publications - Fri, 04/01/2016 - 09:00
Demand for data storage is growing exponentially, but the capacity of existing storage media is not keeping up. Using DNA to archive data is an attractive possibility because it is extremely dense, with a raw limit of 1 exabyte per cubic millimeter, and long-lasting, with observed half-life of over 500 years. This paper presents an architecture for a DNA-backed archival storage system. It is structured as a key-value store, and leverages common biochemical techniques to provide random access. We also propose a new encoding scheme that offers controllable redundancy, trading off reliability for density.
Categories: Microsoft

Where Can I Buy a Boulder? Searching for Offline Retail Locations

Microsoft Research Publications - Fri, 04/01/2016 - 09:00
People commonly need to purchase things in person, from large garden supplies to home decor. Although modern search systems are very effective at finding online products, little research attention has been paid to helping users find places that sell a specific product offline. For instance, users searching for an apron are not typically directed to a nearby kitchen store by a standard search engine. In this paper, we investigate "where can I buy"-style queries related to in-person purchases of products and services. Answering these queries is challenging since little is known about the range of products sold in many stores, especially those which are smaller in size. To better understand this class of queries, we first present an in-depth analysis of typical offline purchase needs as observed by a major search engine, producing an ontology of such needs. We then propose ranking features for this new problem, and learn a ranking function that returns stores most likely to sell a queried item or service, even if there is very little information available online about some of the stores. Our final contribution is a new evaluation framework that combines distance with store relevance in measuring the effectiveness of such a search system. We evaluate our method using this approach and show that it outperforms a modern web search engine.
Categories: Microsoft

Exploring limits to prediction in complex social systems: Predicting cascade size on Twitter

Microsoft Research Publications - Fri, 04/01/2016 - 09:00
How predictable is success in complex social systems? In spite of a recent profusion of prediction studies that exploit online social and information network data, this question remains unanswered, in part because it has not been adequately specified. In this paper we attempt to clarify the question by presenting a simple stylized model of success that attributes prediction error to one of two generic sources: insufficiency of available data and/or models on the one hand; and inherent unpredictability of complex social systems on the other. We then use this model to motivate an illustrative empirical study of information cascade size prediction on Twitter. Despite an unprecedented volume of information about users, content, and past performance, our best performing models can explain less than half of the variance in cascade sizes. In turn, this result suggests that even with unlimited data predictive performance would be bounded well below deterministic accuracy. Finally, we explore this potential bound theoretically using simulations of a diffusion process on a random scale free network similar to Twitter. We show that although higher predictive power is possible in theory, such performance requires a homogeneous system and perfect ex-ante knowledge of it: even a small degree of uncertainty in estimating product quality or slight variation in quality across products leads to substantially more restrictive bounds on predictability. We conclude that realistic bounds on predictive accuracy are not dissimilar from those we have obtained empirically, and that such bounds for other complex social systems for which data is more difficult to obtain are likely even lower.
Categories: Microsoft

Toward Full Elasticity in Distributed Static Analysis

Microsoft Research Publications - Wed, 03/23/2016 - 09:00
In this paper we present the design and implementation of an elastic static analysis framework that is designed to scale with the size of the input. Our approach is based on the actor programming model and is deployed in the cloud. This provides a degree of elasticity for CPU, memory and storage resources. To demonstrate the potential of our technique, we show how a typical call graph analysis can be implemented. We experimentally validate the analysis using a combination of both synthetic and real benchmarks. The results show that our analysis scales well in terms of memory pressure independently of the input size, as we add more machines. Despite using stock hardware and incurring a non-trivial communication overhead, our processing time for projects of close to 1M LOC is about 15 minutes. As the number of machines increases, we show that the analysis time does not suffer. Lastly, we demonstrate that querying the results can be performed with a median latency of well under 20 ms.
Categories: Microsoft

XFabric: A Reconfigurable In-Rack Network for Rack-Scale Computers

Microsoft Research Publications - Wed, 03/16/2016 - 09:00
Rack-scale computers are dense clusters with hundreds of micro-servers per rack. Designed for data center workloads, they can have significant power, cost and performance benefits over current racks. The rack network can be distributed, with small packet switches embedded on each processor as part of a system-on-chip (SoC) design. Ingress/egress traffic is forwarded by SoCs that have direct uplinks to the data center. Such fabrics are not fully provisioned and the chosen topology and uplink placement impacts performance for different workloads. XFabric is a rack-scale network that reconfigures the topology and uplink placement using a circuit-switched physical layer over which SoCs perform packet switching. To satisfy tight power and space requirements in the rack, XFabric does not use a single large circuit switch, instead relying on a set of independent smaller circuit switches. This introduces partial reconfigurability, as some ports in the rack cannot be connected by a circuit. XFabric optimizes the physical topology and manages uplinks, efficiently coping with partial reconfigurability. It significantly outperforms static topologies and has a performance similar to fully reconfigurable fabrics. We demonstrate the benefits of XFabric using flow-based simulations and a prototype built with electrical crosspoint switch ASICs.
Categories: Microsoft

Video Cube

Microsoft Research Downloads - Wed, 03/16/2016 - 09:00
VideoCube allows one to load an AVI movie file as a volume, and play back the movie sampling space and time in different ways. It also provides a single cutting plane for interactively viewing single spacetime slices of the video.
Categories: Microsoft

StereoMatcher

Microsoft Research Downloads - Wed, 03/16/2016 - 09:00
StereoMatcher is an implementation of some commonly used two-frame stereo matching algorithms. It also contains code to evaluate the quality of a computed depth map relative to a ground truth image.
Categories: Microsoft

Pan - Source

Microsoft Research Downloads - Wed, 03/16/2016 - 09:00
Pan is an experimental embedded language and compiler for image synthesis and manipulation, based on principles from functional programming.
Categories: Microsoft

Pan - Components

Microsoft Research Downloads - Wed, 03/16/2016 - 09:00
Pan is an experimental embedded language and compiler for image synthesis and manipulation, based on principles from functional programming.
Categories: Microsoft

Mppt

Microsoft Research Downloads - Wed, 03/16/2016 - 09:00
Mppt is an add-in for PowerPoint that allows a presenter to multicast PowerPoint slides, including animations and effects, to a group of viewers.
Categories: Microsoft

Mping

Microsoft Research Downloads - Wed, 03/16/2016 - 09:00
Mping is a simple command line application that sends and receives multicast packets.
Categories: Microsoft

JCLUSTER

Microsoft Research Downloads - Wed, 03/16/2016 - 09:00
JCLUSTER is a fast simple clustering program that produces hierarchical, binary branching, tree structured clusters.
Categories: Microsoft
Syndicate content

eXTReMe Tracker