<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/"
		>
<channel>
	<title>Comments on: The Perils of Text-Only Search</title>
	<atom:link href="http://kellblog.com/2009/09/01/the-perils-of-text-only-search/feed/" rel="self" type="application/rss+xml" />
	<link>http://kellblog.com/2009/09/01/the-perils-of-text-only-search/</link>
	<description>The official blog of Dave Kellogg</description>
	<lastBuildDate>Thu, 09 Feb 2012 19:36:00 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.com/</generator>
	<item>
		<title>By: Dave Kellogg</title>
		<link>http://kellblog.com/2009/09/01/the-perils-of-text-only-search/#comment-2938</link>
		<dc:creator><![CDATA[Dave Kellogg]]></dc:creator>
		<pubDate>Wed, 02 Sep 2009 23:45:57 +0000</pubDate>
		<guid isPermaLink="false">http://test.kellblog.com/2009/09/01/the-perils-of-text-only-search/#comment-2938</guid>
		<description><![CDATA[Hi Seth,Thanks for reading.  Indeed on your first point, I agree, the pages are in one sense new, because they are probably updated, but then again they are in another sense, old.  For example, RSS seems to do a great job (and I&#039;m not sure how) at only showing one post -- the most recent -- in my feed.  And knowing that it&#039;s the same post, just modified.  Google seems to lose track of when a page is the same page, just updated, vs. a new page. And I&#039;ve embarrassed myself about 10 times at Mark Logic sending mails to the whole company saying &quot;holy cow, breaking news, did you know that&quot; [insert something that happened 2 years ago.]As for search syntax in Google, I&#039;m not sure there&#039;s an easy answer for my trival problem with two adjacent words spanning paragraph.And, as I&#039;m sure you know, my real point is that that is a *trivial* example of a much broader point on XML-awareness in content.I checked here for a solution on Google and, on a quick glance, didn&#039;t see one.http://www.google.com/support/websearch/bin/answer.py?answer=136861]]></description>
		<content:encoded><![CDATA[<p>Hi Seth,Thanks for reading.  Indeed on your first point, I agree, the pages are in one sense new, because they are probably updated, but then again they are in another sense, old.  For example, RSS seems to do a great job (and I&#039;m not sure how) at only showing one post &#8212; the most recent &#8212; in my feed.  And knowing that it&#039;s the same post, just modified.  Google seems to lose track of when a page is the same page, just updated, vs. a new page. And I&#039;ve embarrassed myself about 10 times at Mark Logic sending mails to the whole company saying &quot;holy cow, breaking news, did you know that&quot; [insert something that happened 2 years ago.]As for search syntax in Google, I&#039;m not sure there&#039;s an easy answer for my trival problem with two adjacent words spanning paragraph.And, as I&#039;m sure you know, my real point is that that is a *trivial* example of a much broader point on XML-awareness in content.I checked here for a solution on Google and, on a quick glance, didn&#039;t see one.<a href="http://www.google.com/support/websearch/bin/answer.py?answer=136861" rel="nofollow">http://www.google.com/support/websearch/bin/answer.py?answer=136861</a></p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Seth Grimes</title>
		<link>http://kellblog.com/2009/09/01/the-perils-of-text-only-search/#comment-2937</link>
		<dc:creator><![CDATA[Seth Grimes]]></dc:creator>
		<pubDate>Wed, 02 Sep 2009 15:08:40 +0000</pubDate>
		<guid isPermaLink="false">http://test.kellblog.com/2009/09/01/the-perils-of-text-only-search/#comment-2937</guid>
		<description><![CDATA[Dave, of course --1) I&#039;m guessing that the old pages for which you receive alerts *are* new: they&#039;re new to Google&#039;s index. I&#039;d say that returning alerts for newly indexed content is valid although allowing you to specify you want only newly published content would be nice.2) I guess Google is doing greedy pattern matching, casting a wide net.  I bet they have some level of regex support however.  Have you tried being more precise? Seth]]></description>
		<content:encoded><![CDATA[<p>Dave, of course &#8211;1) I&#039;m guessing that the old pages for which you receive alerts *are* new: they&#039;re new to Google&#039;s index. I&#039;d say that returning alerts for newly indexed content is valid although allowing you to specify you want only newly published content would be nice.2) I guess Google is doing greedy pattern matching, casting a wide net.  I bet they have some level of regex support however.  Have you tried being more precise? Seth</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Dave Kellogg</title>
		<link>http://kellblog.com/2009/09/01/the-perils-of-text-only-search/#comment-2936</link>
		<dc:creator><![CDATA[Dave Kellogg]]></dc:creator>
		<pubDate>Wed, 02 Sep 2009 14:58:15 +0000</pubDate>
		<guid isPermaLink="false">http://test.kellblog.com/2009/09/01/the-perils-of-text-only-search/#comment-2936</guid>
		<description><![CDATA[Yes and No.The server can be configured to ignore “superfluous tags” for phrases. In fact, by default we ignore many common HTML ones like the bold tag.So for a document with:&lt;b&gt;hello&lt;/b&gt;worldYou can search for “hello world” and MarkLogic would find the document.  We call this feature phrase-through.However, the behavior of the server is such that these tags act as word boundaries.  Hence, tags through  a word actually break the word into two:&lt;b&gt;super&lt;/b&gt;badSearch for “superbad” would miss. Search for “super bad” would hit.I’d call this capability word-through, though we don&#039;t currently support it.]]></description>
		<content:encoded><![CDATA[<p>Yes and No.The server can be configured to ignore “superfluous tags” for phrases. In fact, by default we ignore many common HTML ones like the bold tag.So for a document with:&lt;b&gt;hello&lt;/b&gt;worldYou can search for “hello world” and MarkLogic would find the document.  We call this feature phrase-through.However, the behavior of the server is such that these tags act as word boundaries.  Hence, tags through  a word actually break the word into two:&lt;b&gt;super&lt;/b&gt;badSearch for “superbad” would miss. Search for “super bad” would hit.I’d call this capability word-through, though we don&#039;t currently support it.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Anonymous</title>
		<link>http://kellblog.com/2009/09/01/the-perils-of-text-only-search/#comment-2935</link>
		<dc:creator><![CDATA[Anonymous]]></dc:creator>
		<pubDate>Wed, 02 Sep 2009 12:12:28 +0000</pubDate>
		<guid isPermaLink="false">http://test.kellblog.com/2009/09/01/the-perils-of-text-only-search/#comment-2935</guid>
		<description><![CDATA[Interesting!  I think you covered most cases, except one, where you want the system to ignore the structural tag.  Can you set Mark Logic to ignore presentation-level tags inside of words like em?  Will Mark Logic find a hit for a search on &quot;superbad&quot; if the source text breaks up the word as follows:[b]super[/b]bad  (&lt;b&gt;super&lt;/b&gt;bad)?  (I&#039;m using square instead of angle brackets.)Can you set Mark Logic to ignore such &quot;superfluous&quot; tags in fulltext search?]]></description>
		<content:encoded><![CDATA[<p>Interesting!  I think you covered most cases, except one, where you want the system to ignore the structural tag.  Can you set Mark Logic to ignore presentation-level tags inside of words like em?  Will Mark Logic find a hit for a search on &quot;superbad&quot; if the source text breaks up the word as follows:[b]super[/b]bad  (<b>super</b>bad)?  (I&#039;m using square instead of angle brackets.)Can you set Mark Logic to ignore such &quot;superfluous&quot; tags in fulltext search?</p>
]]></content:encoded>
	</item>
</channel>
</rss>

