<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/"
		>
<channel>
	<title>Comments on: The Oracle of Unstructured Information: A Three-Horse Race</title>
	<atom:link href="http://kellblog.com/2010/01/21/the-oracle-of-unstructured-information-a-three-horse-race/feed/" rel="self" type="application/rss+xml" />
	<link>http://kellblog.com/2010/01/21/the-oracle-of-unstructured-information-a-three-horse-race/</link>
	<description>The official blog of Dave Kellogg</description>
	<lastBuildDate>Thu, 09 Feb 2012 19:36:00 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.com/</generator>
	<item>
		<title>By: Dave Kellogg</title>
		<link>http://kellblog.com/2010/01/21/the-oracle-of-unstructured-information-a-three-horse-race/#comment-3036</link>
		<dc:creator><![CDATA[Dave Kellogg]]></dc:creator>
		<pubDate>Thu, 19 Aug 2010 20:56:37 +0000</pubDate>
		<guid isPermaLink="false">http://test.kellblog.com/2010/01/21/the-oracle-of-unstructured-information-a-three-horse-race/#comment-3036</guid>
		<description><![CDATA[Sameer,

Thanks for sharing.  Very interesting.]]></description>
		<content:encoded><![CDATA[<p>Sameer,</p>
<p>Thanks for sharing.  Very interesting.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Sameer Nori</title>
		<link>http://kellblog.com/2010/01/21/the-oracle-of-unstructured-information-a-three-horse-race/#comment-3035</link>
		<dc:creator><![CDATA[Sameer Nori]]></dc:creator>
		<pubDate>Thu, 19 Aug 2010 19:54:42 +0000</pubDate>
		<guid isPermaLink="false">http://test.kellblog.com/2010/01/21/the-oracle-of-unstructured-information-a-three-horse-race/#comment-3035</guid>
		<description><![CDATA[Dave

Interestingly seems that Endeca is now organized by three markets(Public Sector, Enterprise &amp; Ebusiness) as evidenced by their new management structure.

http://www.endeca.com/about-us-leadership-team.htm]]></description>
		<content:encoded><![CDATA[<p>Dave</p>
<p>Interestingly seems that Endeca is now organized by three markets(Public Sector, Enterprise &amp; Ebusiness) as evidenced by their new management structure.</p>
<p><a href="http://www.endeca.com/about-us-leadership-team.htm" rel="nofollow">http://www.endeca.com/about-us-leadership-team.htm</a></p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Dave Kellogg</title>
		<link>http://kellblog.com/2010/01/21/the-oracle-of-unstructured-information-a-three-horse-race/#comment-3034</link>
		<dc:creator><![CDATA[Dave Kellogg]]></dc:creator>
		<pubDate>Wed, 03 Feb 2010 18:19:13 +0000</pubDate>
		<guid isPermaLink="false">http://test.kellblog.com/2010/01/21/the-oracle-of-unstructured-information-a-three-horse-race/#comment-3034</guid>
		<description><![CDATA[Felciano -- here is an answer provided by Jason Hunter of the Mark Logic team:Taxonomies and ontologies are (in our view) an application-level feature.  MarkLogic supports taxonomies and ontologies by providing robust building-blocks that make it really easy to build them, in the various shapes and sizes that our customers require. At K&amp;L, for example, there are a few ways to handle the hierarchical geographic representation.* Have each document know its country, sub-region, and appellation.&lt;geography&gt;  &lt;country&gt;United States&lt;/country&gt;  &lt;subregion&gt;California&lt;/subregion&gt;  &lt;appellation&gt;Napa Valley&lt;/appellation&gt;&lt;/geography&gt;The top listing would be to show country values (and counts) matching the other constraints.  The sub-region listing would be to show subregion values and counts matching the other constraints plus the country = &quot;X&quot; constraint.  We can very quickly do that.  And the appellation listing would be to show appellation values and counts matching the other constraints plus the established country and subregion constraints.  This works fine for a taxonomy that isn&#039;t under a lot of flux.  This also lets you do what K&amp;L does and show both Country and Sub-Region as top-level listings.* Use geospatial indexes.  Normally this isn&#039;t an option, but it would be here for a geographic constraint.  Assign a geo point to every winery.  Define countries, subregions, and appellations by different polygon bounding boxes.* Use an external value-centric definition.  This is somewhat like the geospatial indexes approach in that you give each item its final value (but textual instead of geographic) and use an external definition to define the higher-up groupings (California consists of Napa Valley, Sonoma County, and ...).  Use OR queries to group the different text values.  This works well when the groupings change more often.* Use scalar index bucketing.  For anything with a number value (i.e. prices, sizes, ratings) there&#039;s built-in support for bucketing.  Each item gets a price and you can bucket to arbitrary groupings.  You can shrink the bucket sizes on each refinement.  Bucket sizes can be declared externally or calculated programmatically based on peeking at the search results.This is just a sampling of ways you can use our built-in query and analytic tools to build a hierarchical or otherwise nested taxonomy or ontology.  Looking at the K&amp;L site it&#039;d be pretty easy to do all the features I see.  And with MarkLogic it&#039;s possible to do it against millions of items before even having to cluster, and billions of items in a cluster.]]></description>
		<content:encoded><![CDATA[<p>Felciano &#8212; here is an answer provided by Jason Hunter of the Mark Logic team:Taxonomies and ontologies are (in our view) an application-level feature.  MarkLogic supports taxonomies and ontologies by providing robust building-blocks that make it really easy to build them, in the various shapes and sizes that our customers require. At K&amp;L, for example, there are a few ways to handle the hierarchical geographic representation.* Have each document know its country, sub-region, and appellation.&lt;geography&gt;  &lt;country&gt;United States&lt;/country&gt;  &lt;subregion&gt;California&lt;/subregion&gt;  &lt;appellation&gt;Napa Valley&lt;/appellation&gt;&lt;/geography&gt;The top listing would be to show country values (and counts) matching the other constraints.  The sub-region listing would be to show subregion values and counts matching the other constraints plus the country = &quot;X&quot; constraint.  We can very quickly do that.  And the appellation listing would be to show appellation values and counts matching the other constraints plus the established country and subregion constraints.  This works fine for a taxonomy that isn&#039;t under a lot of flux.  This also lets you do what K&amp;L does and show both Country and Sub-Region as top-level listings.* Use geospatial indexes.  Normally this isn&#039;t an option, but it would be here for a geographic constraint.  Assign a geo point to every winery.  Define countries, subregions, and appellations by different polygon bounding boxes.* Use an external value-centric definition.  This is somewhat like the geospatial indexes approach in that you give each item its final value (but textual instead of geographic) and use an external definition to define the higher-up groupings (California consists of Napa Valley, Sonoma County, and &#8230;).  Use OR queries to group the different text values.  This works well when the groupings change more often.* Use scalar index bucketing.  For anything with a number value (i.e. prices, sizes, ratings) there&#039;s built-in support for bucketing.  Each item gets a price and you can bucket to arbitrary groupings.  You can shrink the bucket sizes on each refinement.  Bucket sizes can be declared externally or calculated programmatically based on peeking at the search results.This is just a sampling of ways you can use our built-in query and analytic tools to build a hierarchical or otherwise nested taxonomy or ontology.  Looking at the K&amp;L site it&#039;d be pretty easy to do all the features I see.  And with MarkLogic it&#039;s possible to do it against millions of items before even having to cluster, and billions of items in a cluster.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: felciano</title>
		<link>http://kellblog.com/2010/01/21/the-oracle-of-unstructured-information-a-three-horse-race/#comment-3033</link>
		<dc:creator><![CDATA[felciano]]></dc:creator>
		<pubDate>Thu, 28 Jan 2010 22:35:51 +0000</pubDate>
		<guid isPermaLink="false">http://test.kellblog.com/2010/01/21/the-oracle-of-unstructured-information-a-three-horse-race/#comment-3033</guid>
		<description><![CDATA[Out of curiosity Dave, does MarkLogic support the sorts of taxonomy-based searches shown on the K&amp;L Wine site? It seems like publishers -- especially Science and Technology -- would benefit from being able to leverage taxonomies and ontologies defined for markets and fields of study that have unusually large and complicated domain vocabularies.For example, on the K&amp;L site, if you narrow the wines down by region to California, you will then be able to further filter by sub-region specific to California (Napa Valley, Sonoma County, etc). This level of structure probably isn&#039;t in the document itself, but is in the definition of the facet space.I&#039;m asking because it seems like this type of structure -- the knowledge that Sonoma County and Napa Valley are both in California -- is what a structured DB can really help with, but I&#039;m struck that there is very little mention of taxonomies, ontologies, etc on the MarkLogic site.]]></description>
		<content:encoded><![CDATA[<p>Out of curiosity Dave, does MarkLogic support the sorts of taxonomy-based searches shown on the K&amp;L Wine site? It seems like publishers &#8212; especially Science and Technology &#8212; would benefit from being able to leverage taxonomies and ontologies defined for markets and fields of study that have unusually large and complicated domain vocabularies.For example, on the K&amp;L site, if you narrow the wines down by region to California, you will then be able to further filter by sub-region specific to California (Napa Valley, Sonoma County, etc). This level of structure probably isn&#039;t in the document itself, but is in the definition of the facet space.I&#039;m asking because it seems like this type of structure &#8212; the knowledge that Sonoma County and Napa Valley are both in California &#8212; is what a structured DB can really help with, but I&#039;m struck that there is very little mention of taxonomies, ontologies, etc on the MarkLogic site.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Stephen</title>
		<link>http://kellblog.com/2010/01/21/the-oracle-of-unstructured-information-a-three-horse-race/#comment-3032</link>
		<dc:creator><![CDATA[Stephen]]></dc:creator>
		<pubDate>Fri, 22 Jan 2010 21:03:16 +0000</pubDate>
		<guid isPermaLink="false">http://test.kellblog.com/2010/01/21/the-oracle-of-unstructured-information-a-three-horse-race/#comment-3032</guid>
		<description><![CDATA[Dave - I&#039;m going to quote this article to you when we&#039;re both old and grey!]]></description>
		<content:encoded><![CDATA[<p>Dave &#8211; I&#039;m going to quote this article to you when we&#039;re both old and grey!</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: sunny</title>
		<link>http://kellblog.com/2010/01/21/the-oracle-of-unstructured-information-a-three-horse-race/#comment-3031</link>
		<dc:creator><![CDATA[sunny]]></dc:creator>
		<pubDate>Fri, 22 Jan 2010 12:03:30 +0000</pubDate>
		<guid isPermaLink="false">http://test.kellblog.com/2010/01/21/the-oracle-of-unstructured-information-a-three-horse-race/#comment-3031</guid>
		<description><![CDATA[Dave,its quite a long and interesting post. Kudos to your patience. Helped me a lot to understand the in and outs of MarkLogic&#039;s competitor&#039;s focus and strategies.WIll be eagerly waiting for your thoughts on &quot;Why Oracle is not in the 3-horse race and cannot become Oracle of unstructured data&quot;.]]></description>
		<content:encoded><![CDATA[<p>Dave,its quite a long and interesting post. Kudos to your patience. Helped me a lot to understand the in and outs of MarkLogic&#039;s competitor&#039;s focus and strategies.WIll be eagerly waiting for your thoughts on &quot;Why Oracle is not in the 3-horse race and cannot become Oracle of unstructured data&quot;.</p>
]]></content:encoded>
	</item>
</channel>
</rss>

