Monday, December 14, 2009

Get all published items in Sitecore 6.1/6.2


You may need to get all items that were processes by a publishing operation for different reasons, for example, pull certain information for each item and notify CDN of a change so cache could be purged. AKAMAI can be configured to behave that way.
Currently we have two articles on the subject. This one describes how to use the PublishEngine, second article suggests approach with custom PublishLog. Both are not compatible with 6.x. So I’ve looked into the options for the latest Sitecore with tech support. Our findings included introduction of a new PublishProcessor which examines the Queue property of PublishContext, but this did not work properly for smart publishing and I wanted a bulletproof yet simple approach applicable to all possible publishing modes.
Then I remembered that we are already doing similar things for indexing. The update index job processes only changed items and gets that information from the History storage.

So I went ahead and enabled History storage for the web database:
      <!-- web -->
      <database id="web" singleInstance="true" type="Sitecore.Data.Database, Sitecore.Kernel">
        <param desc="name">$(id)</param>
        <icon>Network/16x16/earth.png</icon>
        <securityEnabled>true</securityEnabled>
        <dataProviders hint="list:AddDataProvider">
          ...
        </dataProviders>
        <proxiesEnabled>false</proxiesEnabled>
        <proxyDataProvider ref="proxyDataProviders/main" param1="$(id)" />
        <archives hint="raw:AddArchive">
          ...
        </archives>

        <Engines.HistoryEngine.Storage>
              <obj type="Sitecore.Data.$(database).$(database)HistoryStorage, Sitecore.Kernel">
                  <param connectionStringName="$(id)" />
                  <EntryLifeTime>30.00:00:00</EntryLifeTime>
              </obj>
          </Engines.HistoryEngine.Storage>
        <Engines.HistoryEngine.SaveDotNetCallStack>false</Engines.HistoryEngine.SaveDotNetCallStack>

        <cacheSizes hint="setting">
         ...
        </cacheSizes>
      </database>

Then introduced my custom publish pipeline processor at the end of the stack:
<publish help="Processors should derive from Sitecore.Publishing.Pipelines.Publish.PublishProcessor">
 <processor type="Sitecore.Publishing.Pipelines.Publish.AddLanguagesToQueue, Sitecore.Kernel" />
 <processor type="Sitecore.Publishing.Pipelines.Publish.AddItemsToQueue, Sitecore.Kernel" />
 <processor type="Sitecore.Publishing.Pipelines.Publish.ProcessQueue, Sitecore.Kernel" />
 <processor type="SCUSAINC.Pipelines.Publish.CacheClearer, SCUSAINC.Web" />
</publish>
The code that works against the History storage to retrieve the list of change items is below:
using System;
using System.Collections.Generic;
using Sitecore;
using Sitecore.Data;
using Sitecore.Data.Engines;
using Sitecore.Data.Managers;
using Sitecore.Diagnostics;
using Sitecore.Publishing.Pipelines.Publish;

namespace SCUSAINC.Pipelines.Publish
{
    public class CacheClearer : PublishProcessor
    {
        private readonly string LastUpdate = "AKAMAICacheClearer_LastUpdateTime";
        private List<ID> cacheQueue = new List<ID>();

        public override void Process(PublishContext context)
        {
            Assert.ArgumentNotNull(context, "context");

            ProcessPublishedItems(context);
        }

        protected virtual void ProcessPublishedItems(PublishContext context)
        {
            ProcessHistoryStorage(context.PublishOptions.TargetDatabase);

            foreach (var id in cacheQueue)
            {
                Log.Info("*** Processing cache clear for item: " + id, this);
            }

            Log.Info("*** Total processed: " + cacheQueue.Count, this);
        }

        private void ProcessHistoryStorage(Database database)
        {
            cacheQueue.Clear();

            var utcNow = DateTime.UtcNow;

            // accessing the date of last operation
            var from = LastUpdateTime(database);

            // get the history collection for the specified dates:
            var entrys = HistoryManager.GetHistory(database, from, utcNow);
            if (entrys.Count > 0)
            {
                foreach (var entry in entrys)
                {
                    // if the entry is not added yet and it is related to an item
                    if (!cacheQueue.Contains(entry.ItemId) && entry.Category == HistoryCategory.Item)
                    {
                        cacheQueue.Add(entry.ItemId);
                        database.Properties[LastUpdate] = DateUtil.ToIsoDate(entry.Created, true);
                    }
                }
            }

            // writing back the date flag of our last operation
            database.Properties[LastUpdate] = DateUtil.ToIsoDate(utcNow, true);
        }

        protected DateTime LastUpdateTime(Database database)
        {
            var lastUpdate = database.Properties[LastUpdate];

            if (lastUpdate.Length > 0)
            {
                return DateUtil.ParseDateTime(lastUpdate, DateTime.MinValue);
            }

            return DateTime.MinValue;
        }
    }
}

2 comments:

Max said...

Works like a charm, Alex!

Question - I see that you are outputting messages into the Log object. Is it possible to set up a confirmation type message in the publish wizard "click here to see more info" dialogue?
It's the one that has this info:

Job started: Publish to 'web'
Items created: 0
Items deleted: 0
Items updated: 1
Items skipped: 5471
Job ended: Publish to 'web' (units processed: 5286)

Allan Koch said...

I couldn't get this to work with Sitecore 6.5.

What does work though is adding a processor to the [publishItem] at the end, and implement a class inheriting from Sitecore.Publishing.Pipelines.PublishItem.PublishItemProcessor.

[publishItem help="Processors should derive from Sitecore.Publishing.Pipelines.PublishItem.PublishItemProcessor"]
...
[!-- Your processor --]
[processor type="Website.Core.Pipelines.PublishItemHandler, Website.Core" /]
[/publishItem]

That triggers a hit for every item being processed from what I can tell, either by a direct publish item, or an incremental publish.


namespace Website.Core.Pipelines
{
class PublishItemHandler : Sitecore.Publishing.Pipelines.PublishItem.PublishItemProcessor
{
public override void Process(Sitecore.Publishing.Pipelines.PublishItem.PublishItemContext context)
{
var id = context.ItemId;
var item = Sitecore.Configuration.Factory.GetDatabase("web").GetItem(id);
...
}
}
}


Thanks for the post. It sent me in the right direction :) I was toiling with publish:end event.