WNCG - Wireless Networking and Communications Group - sketching
https://wncg.org/tags/sketching
enSuccinct Representations of Big Data: Binary Embeddings
https://wncg.org/research/briefs/succinct-representations-big-data-binary-embeddings
<div class="field field-name-field-publish-date field-type-datetime field-label-hidden"><div class="field-items"><div class="field-item even"><span class="date-display-single">Thursday, April 2, 2015</span></div></div></div><div class="field field-name-body field-type-text-with-summary field-label-hidden"><div class="field-items"><div class="field-item even"><p>Low distortion embeddings that transform high-dimensional points to low-dimensional space have played an important role in dealing with storage, information retrieval and machine learning problems for modern (large scale) datasets. Xinyang Yi and Profs. Constantine Caramanis and Eric Price develop novel algorithms with the best-known results for this important problem.</p>
<p>Indeed, perhaps one of the most famous results along these lines is the Johnson-Lindenstrauss (JL) lemma, which shows that N points can be embedded into an (approximately) log N-dimensional space while preserving pairwise Euclidean distance up to some small distortion. As the interest stems for massive scale problems, significant recent effort has focused on fast algorithms for computing these embeddings.</p>
<p>Binary embedding is a nonlinear dimension reduction methodology where high dimensional data are mapped to short strings of 0s and 1s -- this is called an embedding into the Hamming cube. The goal is to do this while preserving the structure of the original space. Embedding into the binary cube has two advantages in practice: (i) As each data point is represented by a binary code, the disk size for storing the entire dataset is reduced considerably. (ii) Distance in the binary cube is some function of the Hamming distance, which can be computed quickly using computationally efficient bit-wise operators. As a consequence, binary embedding can be applied to a large number of domains such as biology, finance and computer vision where the data are usually high dimensional. </p>
<p>While the problem of binary embedding is therefore important, it is also difficult. Existing binary embedding algorithms either lack theoretical guarantees or suffer from running time orders of magnitude too large. We make three contributions: (1) we establish a lower bound on the number of points required by any binary embedding; (2) we propose a novel fast binary embedding algorithm with provably optimal bit complexity, and near linear running time.</p>
<p>* Paper: <a href="http://arxiv.org/abs/1502.05746">http://arxiv.org/abs/1502.05746</a></p>
<p>This research was partially funded by the National Science Foundation (NSF) and the U.S. Department of Transportation through the Data-Supported Transportation Operations and Planning (D-STOP) Tier 1 University Transportation Center.</p>
</div></div></div><div class="field field-name-field-related-faculty field-type-node-reference field-label-inline clearfix"><div class="field-label">Related Faculty: </div><div class="field-items"><div class="field-item even"><a href="/people/faculty/constantine-caramanis">Constantine Caramanis</a></div></div></div><div class="field field-name-field-related-students field-type-node-reference field-label-inline clearfix"><div class="field-label">Related Researchers: </div><div class="field-items"><div class="field-item even"><a href="/people/students/xinyang-yi">Xinyang Yi</a></div></div></div><div class="field field-name-field-tags field-type-taxonomy-term-reference field-label-inline clearfix"><div class="field-label">Keywords: </div><div class="field-items"><div class="field-item even"><a href="/tags/sketching">sketching</a>, <a href="/tags/big-data">Big Data</a>, <a href="/tags/high-dimension-datasets">High Dimension Datasets</a></div></div></div>Fri, 03 Apr 2015 03:59:05 +0000cc333383703 at https://wncg.orghttps://wncg.org/research/briefs/succinct-representations-big-data-binary-embeddings#comments