Monday, July 30, 2012

Sync up Sitecore Search Index Update and HTML cache clear


So you have a presentation control (let’s say a sublayout) that relies on Sitecore.Search APIs to get the data to render (news articles for example). Second ingredient in this recipe is HTML cache that you have enabled for this control. The third ingredient is a dedicated Content Delivery server.

You’ve got yourself a race condition between IndexingProvider and HTML Cache!

Now here is what’s going to happen in production:

1. Authors create new news article. (00:00)

2. Editors publish. (00:01)

3. All needed caches are cleared on remote CD server, including HTML cache. (00:03)

4. A visitors requests /news on your CD server. (00:04)
=> This request tells Sitecore to cache your control now. Obviously, with stale data.

5. IndexingProvider gets triggered on Indexing.UpdateInterval which is 5 minutes by default (00:08)
=> Search index is updated with the new article created on step 1.

6. Another visitors requests /news on your CD server. (00:10)
=> Stale html output will be served from cache.

As you can see, such sequence of events is quite realistic, and would happen quite frequently if you have all the ingredients in the recipe.

Since HTML cache clear happens on the remote servers after remote publish:end is being replayed (a couple of seconds after the publishing is done on the CM server), and the index update is governed by a interval based disconnected process that would lag most of the times, what you really need is a way to connect the two processes.

Good news is that Sitecore has all proper tools for the job!

First off, you can leverage the following event, which is getting triggered on the DataProvider level after the IndexingProvider is done updating the index:

<!-- database:propertychanged(string parameterName)
     Raised when database property was changed. -->
<event name="database:propertychanged" />

Secondly, Sitecore already has the handler for html cache clearing.

So simply by registering the HtmlCacheClearer on database:propertychanged would get the job done:

<event name="database:propertychanged">
   <handler type="Sitecore.Publishing.HtmlCacheClearer, Sitecore.Kernel" method="ClearCache">
      <sites hint="list">
         <site>website</site>
       </sites>
   </handler>
</event>

While this approach works, unfortunately it is not efficient, as the HtmlCacheClearer would be triggered every time an entry is written to the Properties table, and Sitecore does that a lot (EventQueue is one example).

So, you can modify the source of the HtmlCacheClearer and read the name of the property, compare it with the IndexingManager’s LastUpdateProperty, and have a much more efficient code running:

var propertyName = Event.ExtractParameter(args, 0) as string;

if (propertyName == null) return;

if(propertyName.Equals(IndexingManager.LastUpdatePropertyKey))
{
      // do the html cache clear magic
}

Important note: in my local tests, I have the database:propertychanged event execute twice, thus triggering two html cache clearings on index update. I don’t see this as a major problem, and this is not a trivial problem to solve.

Hope this solution is going to save you some time.

12 comments:

Unknown said...

Alex - This solution did not work for us. We are still experiencing the same outcome as before even after the solution was added to the web.config. Any ideas?

Unknown said...

Hi Todd,

Could you please add some logging via Sitecore.Diagnostics.Log.Info into your customer HtmlCacheClearer to see whether the code is called and which lines are hit?

This should help.

Unknown said...

Alex,

This is very helpful, thanks! This works for us, except that every time the index updates, the entire HTML cache is cleared for the site. Is there a way to identify the item that was updated and only clear the caches for that item? At least that would allow you to use that item to make some decisions about more granular cache clearing.

Thanks,
Dan

Unknown said...

Hi Dan,

Since html cache is not connected with content items, it's not easy to have partial html cache clearing done. In Sitecore 7 we have a more intelligent process, where each rendering can be configured to be cleared on index update.

What you can try is modify the HtmlCacheClearer so it does not do HtmlCache.Clear(), rather HtmlCache.RemoveKeysContaining(), so you can use pass the keys generated from WebControl.GetCacheKey(). This should clear the cache only for a subset of the html controls, while keeping other non-index dependent controls intact.

Disclaimer: I have not actually tried it, but might work.

Let me know if it works.

Unknown said...

Alex,

Thanks for getting back to me. That seems like it could work for our application, but I'm not sure how to know which control's CacheKey to clear. In that event (database:propertychanged), I can't seem to get any information other than the property that was updated. In other words, I can't figure out the item that was just indexed.

Any ideas?

Oh, and yeah, I can't wait for Sitecore 7. Your recent post about the SearchLog and CrawlerLog had me practically salivating.

Unknown said...

Thanks, Dan :)

I was thinking since you already know which controls are index dependent, you can programmatically construct a cache key for those and call that method instead of having the HTMLCacheClearer do ClearAll(). This will at least preserve other controls in cache

Unknown said...

Ahh, that makes sense. So we'd just create a list (basically) of the controls that rely on index data and use their keys to selectively clear their caches. Neat. Thanks for the ideas!

Unknown said...

Shoot. I found another snag.

The controls we want to refresh in cache are all UserControls, not WebControls. Is there a way to find the CacheKey that gets used to cache a UserControl?

Unknown said...

Hi Dan,

If its a sublayout, you should be able to access the cache key using (Sublayout)this.Parent.

Also, you can leverage Sitecore Rocks to get the list of cache keys for your renderings (right click on the instance in Sitecore Explorer -> Manage -> Caches -> website[html] -> double click):
https://www.dropbox.com/s/2kowrr1iahfx8rm/cache_keys.png

This would work if you connect to a working instance, where each control is loaded in memory.

Afterwards, you can pass the list of the keys similar to teh way the list of sites is passed onto the HtmlCacheClearer:


website


Hope this helps.

Unknown said...

The xml snipped got messed up from the previous comment:

<sites hint="list">
<site>website</site>
</sites>

So:
<cachekeys hint="list">
<cachekey>/controls/title.ascx_#data</cachekey>
</cachekeys>

Unknown said...

Thanks again, Alex; this worked great. Here's what I ended up with:

1. Create new event handler (HtmlCacheClearerAfterIndexUpdate) that has a method to ClearCache and two ArrayList properties: Sites and CacheKeys
2. Add the XML you suggested to the database:propertychanged event.
3. In the ClearCache method, verify the property name is the IndexingManager.LastUpdatePropertyKey, and then, for each site in the Sites list, call RemoveKeysContaining each cache key from the CacheKeys list.

The only addition I made to your suggestion was to add an extension method to HtmlCache to be able to remove keys matching a regex, because we use nested sublayouts.

Unknown said...

Thanks again, Alex; this worked great. Here's what I ended up with:

1. Create new event handler (HtmlCacheClearerAfterIndexUpdate) that has a method to ClearCache and two ArrayList properties: Sites and CacheKeys
2. Add the XML you suggested to the database:propertychanged event.
3. In the ClearCache method, verify the property name is the IndexingManager.LastUpdatePropertyKey, and then, for each site in the Sites list, call RemoveKeysContaining each cache key from the CacheKeys list.

The only addition I made to your suggestion was to add an extension method to HtmlCache to be able to remove keys matching a regex, because we use nested sublayouts.