WNCG - Wireless Networking and Communications Group - High Dimension Datasets
http://wncg.org/tags/high-dimension-datasets
enSuccinct Representations of Big Data: Binary Embeddings
http://wncg.org/research/briefs/succinct-representations-big-data-binary-embeddings
<div class="field field-name-field-publish-date field-type-datetime field-label-hidden"><div class="field-items"><div class="field-item even"><span class="date-display-single">Thursday, April 2, 2015</span></div></div></div><div class="field field-name-body field-type-text-with-summary field-label-hidden"><div class="field-items"><div class="field-item even"> <p>Low distortion embeddings that transform high-dimensional points to low-dimensional space have played an important role in dealing with storage, information retrieval and machine learning problems for modern (large scale) datasets. Xinyang Yi and Profs. Constantine Caramanis and Eric Price develop novel algorithms with the best-known results for this important problem.</p>
<p>Indeed, perhaps one of the most famous results along these lines is the Johnson-Lindenstrauss (JL) lemma, which shows that N points can be embedded into an (approximately) log N-dimensional space while preserving pairwise Euclidean distance up to some small distortion. As the interest stems for massive scale problems, significant recent effort has focused on fast algorithms for computing these embeddings.</p>
<p>Binary embedding is a nonlinear dimension reduction methodology where high dimensional data are mapped to short strings of 0s and 1s -- this is called an embedding into the Hamming cube. The goal is to do this while preserving the structure of the original space. Embedding into the binary cube has two advantages in practice: (i) As each data point is represented by a binary code, the disk size for storing the entire dataset is reduced considerably. (ii) Distance in the binary cube is some function of the Hamming distance, which can be computed quickly using computationally efficient bit-wise operators. As a consequence, binary embedding can be applied to a large number of domains such as biology, finance and computer vision where the data are usually high dimensional. </p>
<p>While the problem of binary embedding is therefore important, it is also difficult. Existing binary embedding algorithms either lack theoretical guarantees or suffer from running time orders of magnitude too large. We make three contributions: (1) we establish a lower bound on the number of points required by any binary embedding; (2) we propose a novel fast binary embedding algorithm with provably optimal bit complexity, and near linear running time.</p>
<p>* Paper: <a href="http://arxiv.org/abs/1502.05746">http://arxiv.org/abs/1502.05746</a></p>
<p>This research was partially funded by the National Science Foundation (NSF) and the U.S. Department of Transportation through the Data-Supported Transportation Operations and Planning (D-STOP) Tier 1 University Transportation Center.</p>
</div></div></div><div class="field field-name-field-related-faculty field-type-node-reference field-label-inline clearfix"><div class="field-label">Related Faculty: </div><div class="field-items"><div class="field-item even"><a href="/people/faculty/constantine-caramanis">Constantine Caramanis</a></div></div></div><div class="field field-name-field-related-students field-type-node-reference field-label-inline clearfix"><div class="field-label">Related Researchers: </div><div class="field-items"><div class="field-item even"><a href="/people/students/xinyang-yi">Xinyang Yi</a></div></div></div><div class="field field-name-field-tags field-type-taxonomy-term-reference field-label-inline clearfix"><div class="field-label">Keywords: </div><div class="field-items"><div class="field-item even"><a href="/tags/sketching">sketching</a>, <a href="/tags/big-data">Big Data</a>, <a href="/tags/high-dimension-datasets">High Dimension Datasets</a></div></div></div>Fri, 03 Apr 2015 03:59:05 +0000cc333383703 at http://wncg.orghttp://wncg.org/research/briefs/succinct-representations-big-data-binary-embeddings#commentsBayesian Sparse Principal Component Analysis
http://wncg.org/research/briefs/bayesian-sparse-principal-component-analysis
<div class="field field-name-field-publish-date field-type-datetime field-label-hidden"><div class="field-items"><div class="field-item even"><span class="date-display-single">Tuesday, January 27, 2015</span></div></div></div><div class="field field-name-body field-type-text-with-summary field-label-hidden"><div class="field-items"><div class="field-item even"> <p>Several real-life high dimension datasets can be reasonably represented as a linear combination of a few sparse vectors. Succinct representation of such data with a few selected variables is highly desirable for such cases. A Bayesian setup is useful because the limitation of knowing a limited number of high dimensional data points can be alleviated by well-designed domain-specific priors. WNCG Prof. Joydeep Ghosh, his student Rajiv Khanna, and WNCG alumnus Oluwasanmi Koyejo, currently at Stanford, are developing scalable Bayesian PCA models to extract sparse components from large datasets using a novel constrained inference framework. Results obtained so far show clear superiority as compared to a large list of standard baselines.</p>
<p>This work will be presented at AISTATS 2015.</p>
<p>The preprint is available for view by WNCG Industral Affiliates only.</p>
</div></div></div><div class="field field-name-field-related-faculty field-type-node-reference field-label-inline clearfix"><div class="field-label">Related Faculty: </div><div class="field-items"><div class="field-item even"><a href="/people/faculty/joydeep-ghosh">Joydeep Ghosh</a></div></div></div><div class="field field-name-field-related-students field-type-node-reference field-label-inline clearfix"><div class="field-label">Related Researchers: </div><div class="field-items"><div class="field-item even"><a href="/people/students/rajiv-khanna">Rajiv Khanna </a></div></div></div><div class="field field-name-field-tags field-type-taxonomy-term-reference field-label-inline clearfix"><div class="field-label">Keywords: </div><div class="field-items"><div class="field-item even"><a href="/tags/bayesian-analysis">Bayesian analysis</a>, <a href="/tags/joydeep-ghosh">Joydeep Ghosh</a>, <a href="/tags/high-dimension-datasets">High Dimension Datasets</a>, <a href="/tags/aistats-2015">AISTATS 2015</a></div></div></div><div class="field-collection-container clearfix"><div class="field field-name-field-affiliates-only-files field-type-field-collection field-label-above"><div class="field-label">Affiliates Only Files: </div><div class="field-items"><div class="field-item even"><div class="field-collection-view clearfix view-mode-full field-collection-view-final"><div class="entity entity-field-collection-item field-collection-item-field-affiliates-only-files clearfix">
<div class="content">
<div class="field field-name-field-title-of-file field-type-text field-label-hidden affiliates_only_file_title"><div class="field-items"><div class="field-item even">Sparse Submodular Probabilistic PCA</div></div></div> </div>
</div>
</div></div></div></div></div>Tue, 27 Jan 2015 20:57:36 +0000lab27993614 at http://wncg.orghttp://wncg.org/research/briefs/bayesian-sparse-principal-component-analysis#comments