<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Suresh Venkatasubramanian</title>
	<atom:link href="http://apollonius.cs.utah.edu/web/feed/" rel="self" type="application/rss+xml" />
	<link>http://apollonius.cs.utah.edu/web</link>
	<description></description>
	<lastBuildDate>Thu, 12 Nov 2009 08:09:44 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.4</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Matching Shapes Using the Current Distance</title>
		<link>http://apollonius.cs.utah.edu/web/2009/10/10/matching-shapes-using-the-current-distance/</link>
		<comments>http://apollonius.cs.utah.edu/web/2009/10/10/matching-shapes-using-the-current-distance/#comments</comments>
		<pubDate>Sat, 10 Oct 2009 07:20:27 +0000</pubDate>
		<dc:creator>suresh</dc:creator>
				<category><![CDATA[Papers]]></category>

		<guid isPermaLink="false">http://apollonius.cs.utah.edu/web/?p=73</guid>
		<description><![CDATA[19th Fall Workshop on Computational Geometry, 2009

Links: PDF
]]></description>
			<content:encoded><![CDATA[<p><span class=author>Sarang Joshi, Raj Varma Kommaraju, Jeff Phillips, and Suresh Venkatasubramanian</span><br />
<a href="http://www.cs.tufts.edu/research/geometry/FWCG09/">19th Fall Workshop on Computational Geometry, 2009</a><br />
<span id="more-73"></span><br />
Links: <a href='http://apollonius.cs.utah.edu/web/wp-content/uploads/2009/10/fwcg09.pdf'>PDF</a></p>
]]></content:encoded>
			<wfw:commentRss>http://apollonius.cs.utah.edu/web/2009/10/10/matching-shapes-using-the-current-distance/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Computing Hulls in Positive Definite Space</title>
		<link>http://apollonius.cs.utah.edu/web/2009/10/10/computing-hulls-in-positive-definite-space/</link>
		<comments>http://apollonius.cs.utah.edu/web/2009/10/10/computing-hulls-in-positive-definite-space/#comments</comments>
		<pubDate>Sat, 10 Oct 2009 07:16:01 +0000</pubDate>
		<dc:creator>suresh</dc:creator>
				<category><![CDATA[Papers]]></category>

		<guid isPermaLink="false">http://apollonius.cs.utah.edu/web/?p=69</guid>
		<description><![CDATA[19th Fall Workshop on Computational Geometry

Links: PDF

This material is based upon work supported by the National Science Foundation under Grant No. 0841185
]]></description>
			<content:encoded><![CDATA[<p><span class=author>P. Thomas Fletcher, John Moeller, Jeff Phillips and Suresh Venkatasubramanian</span><br />
<a href="http://www.cs.tufts.edu/research/geometry/FWCG09/">19th Fall Workshop on Computational Geometry</a><br />
<span id="more-69"></span><br />
Links: <a href="http://apollonius.cs.utah.edu/web/wp-content/uploads/2009/10/paper.pdf">PDF</a></p>
<hr />
This material is based upon work supported by the National Science Foundation under Grant No. 0841185</p>
]]></content:encoded>
			<wfw:commentRss>http://apollonius.cs.utah.edu/web/2009/10/10/computing-hulls-in-positive-definite-space/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Information Theory For Data Management (Tutorial)</title>
		<link>http://apollonius.cs.utah.edu/web/2009/09/07/information-theory-for-data-management-tutorial/</link>
		<comments>http://apollonius.cs.utah.edu/web/2009/09/07/information-theory-for-data-management-tutorial/#comments</comments>
		<pubDate>Tue, 08 Sep 2009 05:59:56 +0000</pubDate>
		<dc:creator>suresh</dc:creator>
				<category><![CDATA[Papers]]></category>

		<guid isPermaLink="false">http://apollonius.cs.utah.edu/web/?p=62</guid>
		<description><![CDATA[35th International Conference on Very Large Databases (VLDB)

We are awash in data. The explosion in computing power and computing infrastructure allows us to generate multitudes of data, in differing formats, at different scales, and in inter-related areas. Data management is fundamentally about the harnessing of this data to extract information, discovering good representations of the [...]]]></description>
			<content:encoded><![CDATA[<p><span class=author>Divesh Srivastava and Suresh Venkatasubramanian</span><br />
<a href="http://vldb2009.org">35th International Conference on Very Large Databases (VLDB)</a><br />
<span id="more-62"></span><br />
We are awash in data. The explosion in computing power and computing infrastructure allows us to generate multitudes of data, in differing formats, at different scales, and in inter-related areas. Data management is fundamentally about the harnessing of this data to extract information, discovering good representations of the information, and analyzing information sources to glean structure.  Data management generally presents us with cost-benefit tradeoffs. If we store more information, we get better answers to queries, but we pay the price in terms of increased storage. Conversely, reducing the amount of information we store improves performance at the cost of decreased accuracy for query results.  The ability to quantify information gain or loss can only improve our ability to design good representations, storage mechanisms, and analysis tools for data.</p>
<p>Information theory provides us with the tools to quantify information in this manner. It was originally designed as a theory of data communication over noisy channels. However, it has more recently been used as an abstract domain-independent technique for representing and analyzing data. For example, entropy measures the degree of disorder in data and mutual information captures the idea of noisy relationships among data. In general, viewing information theory as a tool to express and quantify notions of information content and information transfer has been very successful as a way of extracting structure from data.</p>
<p>In this tutorial, we will explore the use of information theory as part of a data representation and analysis toolkit. We will do this with illustrative examples that span a wide range of topics of interest to data management researchers and practitioners. We will also examine the computational challenges associated with information-theoretic primitives, indicating how they might be computed efficiently.</p>
<p>Links: <a href="http://apollonius.cs.utah.edu/suresh/papers/inftut/informationtheory.ppt">PPT</a> (Warning: 9 MB file!)</p>
]]></content:encoded>
			<wfw:commentRss>http://apollonius.cs.utah.edu/web/2009/09/07/information-theory-for-data-management-tutorial/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Streamed Learning: One-Pass SVMs</title>
		<link>http://apollonius.cs.utah.edu/web/2009/04/08/streamed-learning-one-pass-svms/</link>
		<comments>http://apollonius.cs.utah.edu/web/2009/04/08/streamed-learning-one-pass-svms/#comments</comments>
		<pubDate>Wed, 08 Apr 2009 07:29:21 +0000</pubDate>
		<dc:creator>suresh</dc:creator>
				<category><![CDATA[Papers]]></category>

		<guid isPermaLink="false">http://apollonius.cs.utah.edu/web/?p=53</guid>
		<description><![CDATA[Twenty-First International Joint Conference on Artificial Intelligence (IJCAI-09)

We present a streaming model for large scale classification (in the context of $\ell_2$-SVM) by leveraging connections between learning and computational geometry. The streaming model imposes the constraint that only a single pass over the data is allowed. The $\ell_2$-SVM is known to have an equivalent formulation in [...]]]></description>
			<content:encoded><![CDATA[<p><span class=author>Piyush Rai, Hal Daume III, and Suresh Venkatasubramanian</span><br />
<a href="http://ijcai-09.org/index.html">Twenty-First International Joint Conference on Artificial Intelligence (IJCAI-09)</a><br />
<span id="more-53"></span></p>
<p>We present a streaming model for large scale classification (in the context of $\ell_2$-SVM) by leveraging connections between learning and computational geometry. The streaming model imposes the constraint that only a single pass over the data is allowed. The $\ell_2$-SVM is known to have an equivalent formulation in terms of minimum enclosing balls (MEB) and an efficient algorithm based on the idea of core sets exists (CVM) (Tsang et al., 2005) which learns a (1+$\epsilon$) approximate MEB for a set of points and yield an approximate solution to corresponding SVM instance. However CVM works in batch mode requiring multiple passes over the data. We present a single-pass SVM based on the minimum enclosing ball of streaming data. We show that the MEB updates for the streaming case can be easily adapted to learn the SVM weight vector using simple Perceptron-like update equations. Our algorithm performs polylogarithmic computation at each example, requires very small and constant storage, and finds simpler solutions (measured in terms of the number of support vectors). Experimental results show that, even in such restrictive settings, we can learn efficiently in just one pass and get accuracies comparable to other state-of-the-art SVM solvers. We also discuss some open issues and possible extensions.</p>
]]></content:encoded>
			<wfw:commentRss>http://apollonius.cs.utah.edu/web/2009/04/08/streamed-learning-one-pass-svms/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Approximate Shape Matching And Symmetry Detection for 3D Shapes With Guaranteed Error Bounds</title>
		<link>http://apollonius.cs.utah.edu/web/2009/02/23/approximate-shape-matching-and-symmetry-detection-for-3d-shapes-with-guaranteed-error-bounds/</link>
		<comments>http://apollonius.cs.utah.edu/web/2009/02/23/approximate-shape-matching-and-symmetry-detection-for-3d-shapes-with-guaranteed-error-bounds/#comments</comments>
		<pubDate>Tue, 24 Feb 2009 05:57:48 +0000</pubDate>
		<dc:creator>suresh</dc:creator>
				<category><![CDATA[Papers]]></category>

		<guid isPermaLink="false">http://apollonius.cs.utah.edu/web/?p=42</guid>
		<description><![CDATA[SMI 2009: IEEE International Conference on Shape Modeling and Applications 

]]></description>
			<content:encoded><![CDATA[<p><span class=author>Shankar Krishnan and Suresh Venkatasubramanian</span><br />
<em><a href="http://cgcad.thss.tsinghua.edu.cn/SMI2009/home.htm">SMI 2009: IEEE International Conference on Shape Modeling and Applications</a> </em><br />
<span id="more-42"></span></p>
]]></content:encoded>
			<wfw:commentRss>http://apollonius.cs.utah.edu/web/2009/02/23/approximate-shape-matching-and-symmetry-detection-for-3d-shapes-with-guaranteed-error-bounds/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Streaming for large scale NLP: Language Modelling</title>
		<link>http://apollonius.cs.utah.edu/web/2009/01/19/streaming-for-large-scale-nlp-language-modelling/</link>
		<comments>http://apollonius.cs.utah.edu/web/2009/01/19/streaming-for-large-scale-nlp-language-modelling/#comments</comments>
		<pubDate>Tue, 20 Jan 2009 01:54:05 +0000</pubDate>
		<dc:creator>suresh</dc:creator>
				<category><![CDATA[Papers]]></category>

		<guid isPermaLink="false">http://apollonius.cs.utah.edu/web/?p=33</guid>
		<description><![CDATA[North American Chapter of the Association for Computational Linguistics &#8211; Human Language Technologies (NAACL HLT) 2009 (to appear)

In this paper, we explore a streaming algorithm paradigm to handle large amounts of data for NLP problems. We present an efficient low-memory method for constructing high-order approximate n-gram frequency counts. The method is based on a deterministic [...]]]></description>
			<content:encoded><![CDATA[<p><span class=author>Amit Goyal, Hal Daume and Suresh Venkatasubramanian</span><br />
<em><a href="http://www.naaclhlt2009.org/">North American Chapter of the Association for Computational Linguistics &#8211; Human Language Technologies (NAACL HLT) 2009</a></em> (to appear)<br />
<span id="more-33"></span><br />
In this paper, we explore a streaming algorithm paradigm to handle large amounts of data for NLP problems. We present an efficient low-memory method for constructing high-order approximate n-gram frequency counts. The method is based on a deterministic streaming algorithm which efficiently computes approximate frequency counts over a stream of data while employing a small memory footprint. We show that this method easily scales to billion-word monolingual corpora using a conventional (4 GB RAM) desktop machine. Statistical machine translation experimental results corroborate that the resulting high-n approximate small language model is as effective as models obtained from other count pruning methods.</p>
]]></content:encoded>
			<wfw:commentRss>http://apollonius.cs.utah.edu/web/2009/01/19/streaming-for-large-scale-nlp-language-modelling/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Type-Based Categorization of Relational Attributes</title>
		<link>http://apollonius.cs.utah.edu/web/2008/11/15/type-based-categorization-of-relational-attributes/</link>
		<comments>http://apollonius.cs.utah.edu/web/2008/11/15/type-based-categorization-of-relational-attributes/#comments</comments>
		<pubDate>Sat, 15 Nov 2008 22:17:13 +0000</pubDate>
		<dc:creator>suresh</dc:creator>
				<category><![CDATA[Papers]]></category>

		<guid isPermaLink="false">http://apollonius.cs.utah.edu/web/?p=28</guid>
		<description><![CDATA[12th International Conference on Extending Database Technology (EDBT 09) (to appear)

In this work we concentrate on categorization of relational attributes based on their data type. Assuming that attribute type/characteristics are unknown or unidentifiable, we analyze and compare a variety of type-based signatures for classifying the attributes based on the semantic type of the data contained [...]]]></description>
			<content:encoded><![CDATA[<p><span class=author>Babak Ahmadi, Marios Hadjieleftheriou, Thomas Seidl, Divesh Srivastava and Suresh Venkatasubramanian</span><br />
<em><a href="http://www.math.spbu.ru/edbticdt/">12th International Conference on Extending Database Technology (EDBT 09)</a></em> (to appear)<br />
<span id="more-28"></span><br />
In this work we concentrate on categorization of relational attributes based on their data type. Assuming that attribute type/characteristics are unknown or unidentifiable, we analyze and compare a variety of type-based signatures for classifying the attributes based on the semantic type of the data contained therein (e.g., router identifiers, social security numbers, email addresses). The signatures can subsequently be used for other applications as well, like clustering and indexing based on data types. This application is useful in cases where very large data collections that are generated in a distributed, ungoverned fashion end up having unknown, incomplete, inconsistent or very complex schemata and schema level meta-data. We concentrate on heuristically generating type-based attribute signatures based on both local and global computation approaches. We show experimentally that by decomposing data into q-grams and then considering signatures based on q-gram distributions, we achieve very good classification accuracy under the assumption that a large sample of the data is available for building the signatures. Then, we turn our attention to cases where a very small sample of the data is available, and hence accurately capturing the q-gram distribution of a given data type is almost impossible. We propose techniques based on dimensionality reduction and soft clustering that exploit correlations between attributes to improve classification accuracy.</p>
<p>Links: <a href='http://apollonius.cs.utah.edu/web/wp-content/uploads/2008/11/paper.pdf'>PDF</a></p>
]]></content:encoded>
			<wfw:commentRss>http://apollonius.cs.utah.edu/web/2008/11/15/type-based-categorization-of-relational-attributes/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Metric Functional Dependencies</title>
		<link>http://apollonius.cs.utah.edu/web/2008/06/28/metric-functional-dependencies/</link>
		<comments>http://apollonius.cs.utah.edu/web/2008/06/28/metric-functional-dependencies/#comments</comments>
		<pubDate>Sat, 28 Jun 2008 08:08:10 +0000</pubDate>
		<dc:creator>suresh</dc:creator>
				<category><![CDATA[Papers]]></category>

		<guid isPermaLink="false">http://apollonius.cs.utah.edu/web/?p=25</guid>
		<description><![CDATA[25th International Conference on Data Engineering, 2009 (to appear)

When merging data from various sources, it is often the case that small variations in data format and interpretation cause traditional functional dependencies (FDs) to be violated, without there being an intrinsic violation of semantics. Examples include differing address formats, or different reported latitude/longitudes for a given [...]]]></description>
			<content:encoded><![CDATA[<p><span class=author>Nick Koudas, Avishek Saha, Divesh Srivastava and Suresh Venkatasubramanian</span><br />
<em><a href="http://i.cs.hku.hk/icde2009/">25th International Conference on Data Engineering, 2009</a> (to appear)</em><br />
<span id="more-25"></span><br />
When merging data from various sources, it is often the case that small variations in data format and interpretation cause traditional functional dependencies (FDs) to be violated, without there being an intrinsic violation of semantics. Examples include differing address formats, or different reported latitude/longitudes for a given address. In such cases, we would like to specify a dependency structure on the merged data that is robust to such small differences.</p>
<p>In this paper, we define metric functional dependencies, which strictly generalize traditional FDs by allowing small differences (controlled by a metric) in values of the consequent attribute of an FD. We show that this notion satisfies many of the standard properties of functional dependencies, and we present efficient algorithms for the verification problem: determining whether a given metric FD (MFD) holds for a given relation.  We show that MFDs can be combined with approximate FDs, allowing tuples with identical antecedents to map to different consequents, some of which correspond to small (acceptable) variations, with others indicating more serious data quality issues. We experimentally demonstrate the validity and efficiency of our approach on various data sets that possess different underlying metrics, and lie in multidimensional spaces.</p>
]]></content:encoded>
			<wfw:commentRss>http://apollonius.cs.utah.edu/web/2008/06/28/metric-functional-dependencies/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>On measures of privacy</title>
		<link>http://apollonius.cs.utah.edu/web/2008/03/30/on-measures-of-privacy/</link>
		<comments>http://apollonius.cs.utah.edu/web/2008/03/30/on-measures-of-privacy/#comments</comments>
		<pubDate>Sun, 30 Mar 2008 07:07:25 +0000</pubDate>
		<dc:creator>suresh</dc:creator>
				<category><![CDATA[Papers]]></category>

		<guid isPermaLink="false">http://apollonius.cs.utah.edu/web/?p=18</guid>
		<description><![CDATA[In Privacy-Preserving Data Mining: Models and Algorithms (Springer). Ed. Charu Aggarwal, Philip S. Yu

An excerpt from the introduction:
In this chapter, we survey the various approaches that have been proposed to measure privacy (and the loss of privacy). Since most privacy concerns (especially those related to health-care information) are raised in the context of legal concerns, [...]]]></description>
			<content:encoded><![CDATA[<p><span class=author>Suresh Venkatasubramanian</span><br />
<em>In <a href="http://www.springer.com/computer/security+and+cryptology/book/978-0-387-70991-8">Privacy-Preserving Data Mining: Models and Algorithms</a> (Springer). Ed. Charu Aggarwal, Philip S. Yu</em><br />
<span id="more-18"></span><br />
An excerpt from the introduction:</p>
<blockquote><p>In this chapter, we survey the various approaches that have been proposed to measure privacy (and the loss of privacy). Since most privacy concerns (especially those related to health-care information) are raised in the context of legal concerns, it is instructive to view privacy from a legal perspective, rather than from purely technical considerations. </p>
<p>It is beyond the scope of this survey\footnote{&#8230;and the expertise of the author!} to review the legal interpretations of privacy. However, one essay on privacy that appears directly relevant (and has inspired at least one paper surveyed here) is the view of privacy in terms of access that others have to us and our information, presented by Ruth Gavison.  In her view, a general definition of privacy must be one that is measurable, of value, and actionable. The first property needs no explanation; the second means that the entity being considered private must be valuable, and the third property argues that from a legal<br />
perspective, only those losses of privacy are interesting that can be prosecuted. </p>
<p>This survey, and much of the research on privacy, concerns itself with the measuring of privacy. </p></blockquote>
<p>Links: <a href="http://www.cs.utah.edu/~suresh/papers/privacy/chap.pdf">PDF</a></p>
]]></content:encoded>
			<wfw:commentRss>http://apollonius.cs.utah.edu/web/2008/03/30/on-measures-of-privacy/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Clustering on streams</title>
		<link>http://apollonius.cs.utah.edu/web/2008/03/29/clustering-on-streams/</link>
		<comments>http://apollonius.cs.utah.edu/web/2008/03/29/clustering-on-streams/#comments</comments>
		<pubDate>Sat, 29 Mar 2008 08:20:38 +0000</pubDate>
		<dc:creator>suresh</dc:creator>
				<category><![CDATA[Papers]]></category>

		<guid isPermaLink="false">http://apollonius.cs.utah.edu/web/2008/03/29/clustering-on-streams/</guid>
		<description><![CDATA[Suresh Venkatasubramanian
Springer Encyclopedia on Databases, to appear.

An instance of a clustering problem (see clustering) consists of a collection of points in a distance space, a measure of the cost of a clustering, and a measure of the size of a clustering. The goal is to compute a partitioning of the points into clusters such that [...]]]></description>
			<content:encoded><![CDATA[<p><span class="author">Suresh Venkatasubramanian</span><br />
<em>Springer Encyclopedia on Databases</em>, to appear.<br />
<span id="more-16"></span><br />
An instance of a clustering problem (see clustering) consists of a collection of points in a distance space, a measure of the cost of a clustering, and a measure of the size of a clustering. The goal is to compute a partitioning of the points into clusters such that the cost of this clustering is minimized, while the size is kept under some predefined threshold. Less commonly, a threshold for the cost is specified, while the goal is to minimize the size of the clustering.</p>
<p>A data stream (see data streams) is a sequence of data presented to an algorithm one item at a time. A stream algorithm, upon reading an item, must perform some action based on this item and the contents of its working space, which is sublinear in the size of the data sequence. After this action is performed (which might include copying the item to its working space), the item is discarded. Clustering on streams refers to the problem of clustering a data set presented as a data stream.</p>
]]></content:encoded>
			<wfw:commentRss>http://apollonius.cs.utah.edu/web/2008/03/29/clustering-on-streams/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
