Wednesday, November 25, 2009

Working with Lucene Search Index in Sitecore 6. Part III - Code examples

If you read my previous posts about Lucene search index, then you already know how to configure it and how it works in Sitecore application.
In this part we will take a look at API to see what can be achieved using the search index.

To search existing index we need to get an index instance somehow. I'm not going to show the code examples that you would write with old search index. If you're intrested in it, check out this article Lecene Search Engine.
Additional layer of API for new search index resides under Sitecore.Search namespace. In order to get a search index object, you would need to use members of SearchManager class.
To get an index by name use this line of code:
Index indx = Search.Manager.GetIndex(index_name);
If you want to use default system index, you can simply call SystemIndex property of SearchManager class. In order to use Sitecore API to look up for some info in the index, you need to created a search context.
It's easy to do by calling CreateSearchContext method of the index object we got previously. It's also possible to create a search context by using one for the constructors of IndexSearchContext class. In this case it will be easy to run search queries by passing a search index instance as a parameter to the search context.
To search information, we need to create a query and run it in the search context. Sitecore API has a few classes that you can use to build search queries. Let's take a look at each of them:
FullTextQuery - this type of query searches the index by "_content" field. All information from text fields (such as "Single-Line Text", "Rich Text", "Multi-Line Text", "text", "rich text", "html", "memo") is stored there. Data in this field are indexed and tokenized. Which means that the search operations running on these data are very efficient.
FieldQuery - this type of query allows you to search any field that was added to the index. By default database crawler adds all item fields to the index.
CombinedQuery - this type of query was designed to allow you create complex queries with additional conditions. For instance to find items which have specific work in title and belong to some category. When you add search queries to this type of query, you need to supply QueryOccurance parameter. It's Enum type that has the following members:
- Must - it's a logical AND operator in Lucene boolean query.
- MustNot - it's a logical NOT operator in Lucene boolean query.
- Should - it's a logical OR operation in Lucene boolean query.
You can read more about this operators in Query Parser Syntax article.

All of these query types are derived from QueryBase class.
There is one thing left until we jump from the theory to some code samples. To run defined queries, you need to use one of Search methods of IndexSearchContext object.
Now let's create a couple of samples to see a real code that goes behind the theory.

Sample 1: Searching text fields.
// Next samples will skip lines with getting the index instance and creating the search context.
// Get an index object
Index indx = SearchManager.GetIndex("my_index");
// Create a search context
using (IndexSearchContext searchContext = indx.CreateSearchContext())
{
// In following examples I will be using QueryBase class to create search queries.
FullTextQuery ftQuery = new FullTextQuery("welcome");
SearchHits results = searchContext.Search(ftQuery);
}

Sample 2: Searching item fields
Let's say we want to find items classified by some category. There is a trick searching by GUIDs so let's say our category is just a string name.
.....
// FieldQuery ctor accepts two parambers. First is a field name. The other one is a value we're looking for.
QueryBase query = FieldQuery("category", "slr");
SearchHits results = searchContext.Search(query);
.....

Sample 3: Searching by multiple conditions
Turned out that your category parameter is not enough to get required results. You client is screaming that there are too many items and business users cannot find ones they are looking for (is there anything they can find?).
Obviously you have some additional fields that can help to find more strict results. Let's say there is a rating field with values from 1 to 5.
That's where CombinedQuery gets into the game.
.....
// CombinedQuery object has Add method that should be used to add search queries to it. That's why we cannot use base class variable here.
CombinedQuery query = new CombinedQuery();
QueryBase catQuery = new FieldQuery("category", "slr");
QueryBase ratQuery = new FieldQuery("rating", "5");
query.Add(catQuery, QueryOccurance.Must);
query.Add(ratQuery, QueryOccurance.Must);
var hits = searchContext.Search(query);
.....

All results are presented as SearchHits object. Now you should use of following methods of SearchHits object to get the results as Sitecore items:
- FetchResults(int, int) - returns search results as SearchResultCollection. First parameter is a start position of an item you want to start fetching results from. Second one is count of items you want to fetch. By calling this function as mentioned below, you can get all results at once:
var results = hits.FetchResults(0, hits.Length);

- Slice(int) - returns all results as IEnumerable collection.

- Slice(int, int) - this method has similar signature to FetchResults but returns results as IEnumerable collection.

Here are a couple of examples the way you can transform SearchHits object into Sitecore items.
Sample 4: using FetchResults
.....
SearchResultCollection results = hits.FetchResults(0, hits.Length);
IEnumerable searchItems = from hit in results
select hit.GetObject();
}
.....

Sample 5: using Slice
.....
IList searchItems = List();
foreach(var hit in hits.Slice(0))
{
ItemUri itemUri = new ItemUri(hit.Url);
if (itemUri != null)
{
Item item = ItemManager.GetItem(itemUri.ItemID, itemUri.Language, itemUri.Version, Factory.GetDatabase(itemUri.DatabaseName));
if (item != null)
{
searchItems.Add(item);
}
}
}
.....

It's worth to mention that some variations of Search method of IndexSearchContext class can accept Lucene.Net.Search.Query as a search query parameter. It becomes very useful when you need to create a complex query which cannot be built with Sitecore query types.

Searching GUIDs.

New search index has lots of useful built-in fields that help to build strict queries.
Besides standard fields it has the following fields that contain GUIDs in ShortID format:
- _links - contains all references to current item

- _path - contains ShortIDs for every parent item in the path relative to current item

- _template - contains GUID of item's tempalte.
NOTE: this field is supposed to have ShortID value instead of GUID one. This field should not be used in combined queries prior to Sitecore 6.2 releases.
If you decide to add custom fields to your search index and they should have GUID values, you need to store them as ShortID in lower case format. Otherwise search will not be able to find any results. The reason why it happens is because Lucene recognizes GUIDs and applies special parsing for them. It works fine if search query has only one field to look into. If it's combined/complex query then it fails to find anything even if it's correct.
So, remember if you need to filter search results by template, you will have to customize DatabaseCrawler to add another field (e.g. _shorttemplateid) to store item template id in ShortID format.

Sample 1: Find all item references

.....
QueryBase query = FieldQuery("_links", ShortID.Encode(item.ID).ToLowerInvariant());
SearchHits results = searchContext.Search(query);
.....

Sample 6: Find all items based on specified template

.....
// Prior to Sitecore 6.2 release, you will need to add and use _shorttemplateid field
QueryBase query = FieldQuery("_template", ShortID.Encode(item.ID).ToLowerInvariant());
SearchHits results = searchContext.Search(query);
.....

Sample 7: Find items that are descendants of a specified one
.....
QueryBase query = FieldQuery("_path", ShortID.Encode(item.ID).ToLowerInvariant());
SearchHits results = searchContext.Search(query);
.....

Sample 8: Find items of a parent and belong to a specific template
.....
CombinedQuery query = new CombinedQuery();
query.Add(new FieldQuery("_shorttemplateid", ShortID.Encode(templateId).ToLowerInvariant()), QueryOccurance.Must);
query.Add(new FieldQuery("_path", ShortID.Encode(parent.ID).ToLowerInvariant())), QueryOccurance.Must);
.....

That's all I wanted to tell about Lucene search index in Sitecore 6. I hope it will help Lucene beginners to better understand the concept and get up to speed with Lucene search index abilities.
Enjoy!

32 comments:

Seenivasan said...

How do you do Range query ?
The Sitecore.Search doesnot have anything for range query

Thanks!

Ivan said...

To run Range query you should use IndexSearchContext.Search(Lecene.Net.Search.Query) method.
Here is an example:

BooleanQuery query = new BooleanQuery();
query.Add(new TermQuery(new Term("_shorttemplateid", templateId)), BooleanClause.Occur.MUST);
query.Add(new RangeQuery(new Term(dateFieldName, date.ToString(DATE_TIME_FORMAT)), null, true), BooleanClause.Occur.MUST);

var results = indexSearchContext.Search(query);

NOTE: in order to run search queries on date fields, you need to convert date field value to "yyyyMMddHHmmss" format during indexing. You can do it by overriding AddAllFields method of DatabaseCrawler class.

vishu said...

What if i want searching like the funnelback does or ankiro do.. in sitecore without using them..
features like spell correcting, suggesting search keywords, threasures etc.,
if anyone have answer or implemented that, he is real champ... in sitecore..in that case catch me at vgupta@ebizneeds.com

Anonymous said...

Hi Ivan,
Can you please post sample code for searching by language. So I want to do a full text query but only return items for the current culture.
Thanks.

Ivan said...

You need to build a CombinedQuery to add language filtering criteria.
Here is an example:

CombinedQuery query = new CombinedQuery();
// Add full text search query
query.Add(new FullTextQuery(search_text_here), QueryOccurance.Should);
// Add language filtering query
query.Add(new FieldQuery(BuiltinFields.Language, Sitecore.Context.Langauge.Name), QueryOccurance.Must);

This query should return only results in context language.

Ivan said...

vishu,

To implement spell correction, suggesting search etc you can try to use Solr library. http://lucene.apache.org/solr/

Asif Maniar said...

I was able to get this to work:
QueryBase langQuery = new FieldQuery("_language", Sitecore.Context.Site.Language);

Let me know if there is a better approach.

Asif Maniar said...

Thanks Ivan!

Asif Maniar said...

When I use the Language.Name and the current context is a specific culture like en-US then the search returns nothing. It seems that the language is indexed by the 2 letter language code but not the culture is that right?

I was able to get this to return results for each culture:
QueryBase langQuery = new FieldQuery(BuiltinFields.Language, Sitecore.Context.Language.CultureInfo.TwoLetterISOLanguageName);

The only problem is that this will return all items in the current language not just the current culture. Any work around for this?

Ivan said...

Language.Name is basically the name of language item in the content tree.
By default English has name as "en" but not "en-US".
If you want to search by ISO code, you need to add a custom field that would store ISO code of the language.
Then you could search by the culture.

Asif Maniar said...

Thanks for that.
One more question regarding search items linked to the current one.

It seems that the _links field only provides items that have a link to the current item using a droplink for example.

We have items "tagged" with other items using a multilist. When I search content tagged with a particular item I want to get all items that have the current item in the "Tag" field which is a multilist.

Tried the following 2 without any results but when I do the same in the Index Viewer using a Term Query it returns the correct result. Any thoughts?

QueryBase tagQuery = new FieldQuery("tag", "{4C174B16-7193-49A7-88C6-7F696003F8BF}");
QueryBase tagQuery = new FieldQuery("tag", ShortID.Encode(new Guid("4C174B16-7193-49A7-88C6-7F696003F8BF")).ToLowerInvariant());

Ivan said...

You should use _links field (Builtin.Links) for that. It has all the reference to an item. References stored in ShortID format.
Values of standard Sitecore fields, that contain references to other items, get indexed into the _links field.
If it cannot find the value for some reason, try to rebuild Link Database and then rebuild your search indexes.
Another reason why it can happen is if you use custom field and forgot to add it to /App_Config/FieldTypes.config file.

Here is a query example:
FieldQuery(Builtin.Links, ShortID.Encode("4C174B16-7193-49A7-88C6-7F696003F8BF").ToLowerInvatiant());

Tom said...

Hi Ivan,

Thanks for your great articles on the new Lucene search in sitecore, they seem to be the only documentation at the moment :)

I think i have the same issue as Asif.
When I search with the Shared Source component “Index Viewer” I can find records for a search on id. When I do the same with the Sitecore Search API I can’t find these records. Also when I try to give a PrefixQuery (from Lucene API) to the search context this does not work. In my example Residence Type is a treelist field.

//Old way does return results
Term term = new Term("residence type", "{B0D17C71-8FAA-4D5D-B87E-C7BBD93AEF69}");//b0d17c718faa4d5db87ec7bbd93aef69
PrefixQuery rd = new PrefixQuery(term);
Hits hits = context.Searcher.Search(rd);

//No results with the new way
SearchHits results = context.Search(rd);

Another one is related to the Link Database. This should be filling the Lucene field “_links” indeed. But I’ve seen that when you’re using a source like this on your sitecore field:
“DataSource={0AFABE4C-7C24-4468-8250-12CD7BE8B405}&IncludeTemplatesForSelection=ListValue” this isn’t updating the link database. Adding DatabaseName mostly does the trick to get this solved, like so:
“DataSource={0AFABE4C-7C24-4468-8250-12CD7BE8B405}&IncludeTemplatesForSelection=ListValue&DatabaseName=master”. Feature request: IncludeTemplatesForSelection is still not able to work with id’s! :)

Tom

Anonymous said...

Hi Ivan
This might be more of a Lucene question than Sitecore.

Am searching for items using _links as you suggested and it works great when finding items linking to one particular item.

For that I create a field query and add it to my base query:

QueryBase tagQuery = new FieldQuery (BuiltinFields.Links, ShortID.Encode(categoryId).ToLowerInvariant());
query.Add(tagQuery, QueryOccurance.Must);

Am having trouble when trying to find items that might be linked to any of the items given a list of items. So basically I want to do an OR query.

If I use the above code and keep adding queries with Query Occurence of MUST then the results are only items linking to all the categories.

I tried SHOULD but it seems to return wrong results. Is there a way to do an OR? Or should I search multiple times and then consolidate the results?

Thanks.

Ivan said...

Try to use your main query as MUST one and then add a BooleanQuery with SHOULD option that would include other SHOULD queries.
Something like this:
BooleanQuery mainQuery = new BooleanQuery();
mainQuery.Add(new TermQuery (new Term (*field*, *value*)), BooleanClause.Occur.MUST);
BooleanQuery subQuery = new BooleanQuery();
subQuery.Add(new TermQuery (new Term (*field*, *value*)), BooleanClause.Occur.SHOULD);
subQuery.Add(new TermQuery (new Term (*field*, *value*)), BooleanClause.Occur.SHOULD);

// Add subQuery to the main one.
mainQuery.Add(subQuery, BooleanClause.Occur.SHOULD);

Anonymous said...

in the old method of lucene index you needed to add the index in the node of the database. Do you still have to do this:

Ivan said...

Could you rephrase the question. I'm not quite sure what you mean by: "...you needed to add the index in the node of the database."

Chris Bernard said...

Is there anything special you have to do to get the language versions indexed? I have both English and Spanish versions of content, but when I run:

CombinedQuery cq = new CombinedQuery();
cq.Add(new FieldQuery(BuiltinFields.Language, Sitecore.Context.Language.Name), QueryOccurance.Must);

SearchHits hits = context.Search(cq);

I get results for English, but none for Spanish. Looking at the index with the IndexViewer shared source module, it doesn't look like the Spanish versions are actually being indexed.

Ivan said...

Larry,

When you rebuild indexes all item versions in all languages get indexed. When you save/remove a specific item version, only that version gets indexed.
Try to use Luke to see all indexed languages in your index. I'm sure your Spanish language is in there.
One thing that I can think of is a language region that gets added to language name through dash, e.g. "es-MX". Indexed value of this language will be "es-mx". That dash could cause a problem for StandardAnalyzer which replaces dashes with space character. Thus you cannot match the value when you search by the language name.
If you use AdvancedDatabaseCrawler module, you can solve this issue by making the Lucene "_language" field as TOKENIZED. Otherwise you will have to customize DatabaseCrawler to reformat the field value.
Another possible solution could be to escape dash character by prefixing it with '\' one. The search value should look like this: "en\-mx". It should work if StandardAnalyzer let it go through. I haven't tested it though.

Azzy said...

Hi,

I am trying to find references to particular item using _links, but it does not return any data.

We have items(products) related to an item(market segment) through TreeListEx.

I am trying to find all products related to market segment using _links as you suggested.


BooleanQuery mainquery = new BooleanQuery();
mainquery.Add(new TermQuery(new Term("_content", searchString)), BooleanClause.Occur.MUST);

BooleanQuery marketquery = new BooleanQuery();
marketquery.Add(new TermQuery(new Term(BuiltinFields.Links, ShortID.Encode(marketid).ToLowerInvariant())), BooleanClause.Occur.SHOULD);
mainquery.Add(marketquery, BooleanClause.Occur.MUST);
SearchHits hits = context.Search(mainquery);

But i dont get any result.
I also rebuild Link Database then rebuild the search indexes.

any idea?

Thanks,
Azzy

Ivan said...

Azzy,
First make sure that TreelistEx is listed in the /App_Config/FieldTypes.config file.
If it's there, check if there are entries for the item in Link database. The "_links" field gets populated out of LinkDatabase API. If there are no entries in Links table for that item, they won't appear in index "_links" field.

Azzy said...

Hi Ivan,
Yes there is entry for TreeListEx in fields.config file
and Links database returns few records when i execute query.

Select * from Links where SourceItemID='ITEMIDHERE'

My Link database points to Core database and there are records in Links table in core database.

am i missing any other configuration?

Regards,
Azzy

Ivan said...

Can you check if SourceFieldID for those records in Links table matches your TreelistEx field. If it does, try to research your Lucene query. Use Luke to construct and execute the query to see if it returns any hits. Try to search just "_links" field for a targeting marketid.
When you rebuild link database check the log file to see if any related errors get registered.
You can also use Luke to see if there are any values in _links field for the item.

Azzy said...

Hi Evan,

Thanks for all your help.

Using Luke i found that _links was not returning any hits, which lead me to proper direction and finally i got search returning proper data.

Once again thanks for your help :)

Regards,
Azzy

Maulik said...

Hi Ivan,
This is really a great Article.

Sitecore Support suggested this article to me. I have implemented Lucene Index on CE using this article. Its working great.
And it gets the results quickly too.

I have a query regarding Range.
Can you help me in my query.

I want to find items below specific path(I can Achieve using 7) and from that list I want to find items which are updated in last month.

Range Query does not work with Combined query. so I am stuck.

Can you help me out.

Ivan said...

Hi Maulik,

To combine Range query with other types of queries, you have to use Lucene BooleanQuery. There you can combine different types of queries. To search for items under specific branch of Sitecore content tree, I would suggest to use BuiltinFields.Path field.
Also consider using AdvancedDatabaseCrawler component as a main Lucene integration. It extends default Sitecore DatabaseCrawler with very useful features. Here the link: http://trac.sitecore.net/AdvancedDatabaseCrawler

Anonymous said...

Having trouble still querying on a droplink field. I expect the value to be a GUID, so I'm doing a fieldquery on the field and have tried multiple combinations of values: shortID, shortID ToLowerInvariant, ToLowerInvariant, etc. I still get no matches. (Yes, I've re-indexed.) What am I doing wrong?

Ivan said...

Try to use Luke tool to see if there is a value for your droplink in the index. If it's not there check whether the field type is registered in /App_Config/FieldTypes.config file. If not, add it.
It's possible that a raw ID gets written into the index. Then you either would search by raw ID which has a lot of pitfalls or use Advanced Database Crawler to index it in an appropriate format.

Unknown said...

Hi,

I am using the following piece of code to search for a user typed text:

BooleanQuery query = new BooleanQuery();

query.Add(new TermQuery(new Term(BuiltinFields.Content, searchString)), BooleanClause.Occur.MUST);

but somehow I get more results than those returned when I use the Index viewer search option for my index. It looks like it is returning even items of which names contain the searched word. Any idea why could be this happening?

Thank you.

Aura P.

Ivan said...

Yes, you're correct. Sitecore adds prefix query on item name every time when you search by BuiltinFields.Content field using TermQuery and calling to IndexSearchContext.Search() method.
If you want to avoid this, just call to Lucene native search methods.

DQ said...

so when I try to use the FullTextQuery - It seems the query is executed by an OR operator. How can I force an AND operator? e.g. If I search for "Dora the Explorer" - it brings results that have any of the above results and some of them are extremely irrelevant.

The original Lucene QueryParser class has a function for this SetDefaultOperator(QueryParser.Operator.AND)but I don't know how can we do this using Sitecore API


Ivan said...

Currently it is a limitation in functionality of FullTextQuery type. It does not allow you to enable relevancy of coordinates of searched phrase.
I'd recommend you to go with standard Lucene methods for now to solve this challenge.
Thank you for bringing this up! I'll pass this to Sitecore devs as an enhancement for Lucene search integration.