Friday, October 1, 2010

Remove old data form the index for Sitecore 6.2 (Update-4) and 6.3

Previously I published an article explaining how to fix a glitch in Sitecore index that keeps old data after updating an item. In this article I updated the code since some issues were fixed in Sitecore 6.2 (Update-4) and 6.3.

Starting from 6.2 (Update-4) and 6.3 the _template field in Lucene index stores an item ID in ShortID format. Now it’s even easier to customize the DatabaseCrawler as the following methods were made virtual: AddItem, IndexVersion, DeleteItem and DeleteVersion. Also fields _hasIncludes, _hasExludes and _templateFilder were made protected which helps to shrink the fix even more.

Here is the code that fixes the outlined problem for these releases:

Code Snippet
  1. namespace Lucene.Search.Crawlers
  2. {
  3.     public class DatabaseCrawler : Sitecore.Search.Crawlers.DatabaseCrawler
  4.     {
  5.         protected override void AddMatchCriteria(Net.Search.BooleanQuery query)
  6.         {
  7.             query.Add(new TermQuery(new Term(BuiltinFields.Database, RootItem.Database.Name)), BooleanClause.Occur.MUST);
  8.             query.Add(new TermQuery(new Term(BuiltinFields.Path, ShortID.Encode(RootItem.ID).ToLowerInvariant())), BooleanClause.Occur.MUST);
  9.             if (this._hasIncludes || this._hasExcludes)
  10.             {
  11.                 foreach (KeyValuePair<string, bool> pair in this._templateFilter)
  12.                 {
  13.                     query.Add(new TermQuery(new Term(BuiltinFields.Template, ShortID.Encode(pair.Key).ToLowerInvariant())), pair.Value ? BooleanClause.Occur.SHOULD : BooleanClause.Occur.MUST_NOT);
  14.                 }
  15.             }
  16.         }
  17.  
  18.         protected Item RootItem
  19.         {
  20.             get
  21.             {
  22.                 return Sitecore.Data.Managers.ItemManager.GetItem(Root, Sitecore.Globalization.Language.Invariant,
  23.                                                                   Sitecore.Data.Version.Latest,
  24.                                                                   Sitecore.Data.Database.GetDatabase(Database),
  25.                                                                   Sitecore.SecurityModel.SecurityCheck.Disable);
  26.             }
  27.         }
  28.  
  29.         protected override Query GetVersionQuery(ID id, string language, string version)
  30.         {
  31.             Assert.ArgumentNotNull(id, "id");
  32.             Assert.ArgumentNotNullOrEmpty(language, "language");
  33.             Assert.ArgumentNotNullOrEmpty(version, "version");
  34.             BooleanQuery query = new BooleanQuery();
  35.             query.Add(new TermQuery(new Term(BuiltinFields.ID, GetItemID(id, language, version).ToLowerInvariant())), BooleanClause.Occur.MUST);
  36.             this.AddMatchCriteria(query);
  37.             return query;
  38.         }
  39.     }
  40. }

Basically you need to make sure that the result of GetItemID method is in lower case as well as a term for _path field, in AddMatchCriteria method, is constructed with lower case query.

No comments: