Tuesday, June 28, 2011

Ensure valid Sitecore Internal Links in Page Editor

(Updated on Dec 9, 2011)
Refer to this post for updated solution.

(Updated on July 27, 2011)

A workaround developed for one of our customers incited me to blog about it as I can see how many other may get affected by this issue.

The issue was that whenever an editor opens RTE control in Page Editor, to get access to all provided functions, and saves the changes, all internal links (that start with “~/link.aspx”) get prefixed with the host name that is used in the browser to access the Page Editor. I could recreate this behavior only in IE browser and only when it runs in compatibility mode. Got same results in both IE8 and IE9 running in compatibility mode. Normal mode of IE8 does not cause the issue.

To address this issue I hooked into <saveUI> pipeline my processor that corrects internal links based on regex that is passed to match a part of the link. Here is the code I came up with:

Code Snippet
  1. using System;
  2. using System.Text.RegularExpressions;
  3. using Sitecore.Configuration;
  4. using Sitecore.Data.Fields;
  5. using Sitecore.Data.Items;
  6. using Sitecore.Pipelines.Save;
  7.  
  8. namespace Sitecore.Support.Pipelines.Save
  9. {
  10.    public class EnsureRichTextRelativeLinks
  11.    {
  12.       public void Process(SaveArgs args)
  13.       {
  14.          if (args.HasSheerUI)
  15.          {
  16.             if ((args.Result == "no") || (args.Result == "undefined"))
  17.             {
  18.                args.AbortPipeline();
  19.             }
  20.             else
  21.             {
  22.                for (int i = 0; i < args.Items.Length; i++)
  23.                {
  24.                   SaveArgs.SaveItem item = args.Items[i];
  25.                   Item contentItem = Context.ContentDatabase.Items[item.ID, item.Language, item.Version];
  26.                   if (contentItem != null)
  27.                   {
  28.                      foreach (SaveArgs.SaveField field in item.Fields)
  29.                      {
  30.                         Field fld = contentItem.Fields[field.ID];
  31.                         if (fld != null && fld.Type.Equals("rich text", StringComparison.InvariantCultureIgnoreCase))
  32.                         {
  33.                            if (!string.IsNullOrEmpty(field.Value))
  34.                            {
  35.                               field.Value = EnsureRelativeLinks(field.Value);
  36.                            }
  37.                         }
  38.                      }
  39.                   }
  40.                }
  41.             }
  42.          }
  43.       }
  44.  
  45.       protected virtual string EnsureRelativeLinks(string fieldValue)
  46.       {
  47.          string internalLinkPattern = Settings.GetSetting("PageEditor.InternalLinkReplacePattern",
  48.                                                           "((http)|(https)):((//)|(\\\\))({0}).*(~/link.aspx)");
  49.          string internalLinkReplacementValue = Settings.GetSetting("PageEditor.InternalLinkReplacementValue",
  50.                                                                    "~/link.aspx");
  51.          string mediaLinkPattern = Settings.GetSetting("PageEditor.MediaLinkReplacePattern",
  52.                                                        "((http)|(https)):((//)|(\\\\))({0}).*(~/media)");
  53.          string mediaLinkReplacementValue = Settings.GetSetting("PageEditor.MediaLinkReplacementValue", "~/media");
  54.          Regex linkPattern = new Regex(string.Format(internalLinkPattern, Sitecore.Web.WebUtil.GetHostName()), RegexOptions.IgnoreCase);
  55.          Regex mediaPattern = new Regex(string.Format(mediaLinkPattern, Sitecore.Web.WebUtil.GetHostName()), RegexOptions.IgnoreCase);
  56.          string value = linkPattern.Replace(fieldValue, internalLinkReplacementValue);
  57.          value = mediaPattern.Replace(value, mediaLinkReplacementValue);
  58.  
  59.          return value;
  60.       }
  61.    }
  62. }

I put regex pattern as well as replacement strings into include config file to make the adjustment easier if necessary. Here is how the config file looks like:

Code Snippet
  1. <configuration xmlns:patch="http://www.sitecore.net/xmlconfig/">
  2.   <sitecore>
  3.     <processors>
  4.       <saveUI>
  5.         <!--
  6.         Fix to address an issue of internal links being converted to absolute links
  7.         after saving content of RTE field in IE browser running in compatibility mode.
  8.         The fix is developed by Sitecore support and should be removed after the problem is fixed in the core product.
  9.         -->
  10.         <processor mode="on" type="Sitecore.Support.Pipelines.Save.EnsureRichTextRelativeLinks, Sitecore.Support.341310" patch:after="processor[@type='Sitecore.Pipelines.Save.ConvertLayoutField, Sitecore.Kernel']" />
  11.       </saveUI>
  12.     </processors>
  13.     <settings>
  14.       <!-- RegEx pattern to ensure valid internal links during the save event in Page Editor.
  15.            The issue occurs in IE browser running in compatibility mode.
  16.            The pattern will be replaced to a value defined at PageEditor.InternalLinkReplacementValue
  17.            The {0} parameter is used to insert a host name used in the browser to access Page Editor.
  18.       -->
  19.       <setting name="PageEditor.InternalLinkReplacePattern" value="((http)|(https)):((//)|(\\\\))({0}).*(~/link.aspx)" />
  20.       <!-- Replacement string for regex pattern defined in PageEditor.InternalLinkReplacePattern setting.
  21.       -->
  22.       <setting name="PageEditor.InternalLinkReplacementValue" value="~/link.aspx" />
  23.       <!-- RegEx pattern to ensure valid media links during the save event in Page Editor.
  24.            The issue occurs in IE browser running in compatibility mode.
  25.            The pattern will be replaced to a value defined at PageEditor.MediaLinkReplacementValue
  26.            The {0} parameter is used to insert a host name used in the browser to access Page Editor.
  27.       -->
  28.       <setting name="PageEditor.MediaLinkReplacePattern" value="((http)|(https)):((//)|(\\\\))({0}).*(~/media)" />
  29.       <!-- Replacement string for regex pattern defined in PageEditor.MediaLinkReplacePattern setting.
  30.       -->
  31.       <setting name="PageEditor.MediaLinkReplacementValue" value="~/media" />
  32.     </settings>
  33.   </sitecore>
  34. </configuration>

Oh yeah, I wasn’t sure if this could happen to media links (couldn’t reproduce it locally) but decided to add the same replacement functionality for “~/media” links as well. If you find it useless, feel free to remove that part :).

This code was developed and tested in Sitecore 6.4.1 rev.110324. It’s expected to work in any 6.4 version. Can’t see any problems with 6.5 but I haven’t tested it there.

As all the code fit into the snippet boxes above, I don’t provide links to sources for download. Though here is the link to Sitecore package that installs the fix.

Hope it saves you some time!

6 comments:

Brendan and Dan said...

Ivan,

Your blog has helped me a few times in the past. I'm reaching out becuase I'm not sure is there is an alternative way to fix my problem.

When I search for a company and then add the word "sitecore" afterwords, you see links from the website but they include the word "sitecore" in the url. I tried using a robots.txt command to get webcrawlers to avoid "sitecore" but its been a few weeks and the links are still up there. Below is a URL which shows an example of what I'm talking about.

http://tinyurl.com/3lold8p

I truely appreacite any help you can provide.

Thanks,
Dan

Ivan said...

Dan,

If you have "sitecore" in the URL (I guess it starts with "/sitecore/content"), there is a misconfiguration in your Sitecore solution.
A good example would be a cross site reference or referring an item from outside of your site structure. If you have cross site references, make sure that "Rendering.SiteResolving" setting has "true" value.
If you're referencing resources in some global repository which is outside of your home item, create a logic that would trim unnecessary parts from the URL.
Another way to solve this issue would be to use items from global repository as a data reference and have an item that has presentation components under the site structure.

Best regards,
--Ivan

Anonymous said...

Great post Ivan. I did find your regex to be a bit hungry though was matching all content between first and last links if there was more than one. I had to update to:
((http)|(https)):((//)|(\\\\))({0}).*?(~/link.aspx)

After that it worked a treat.

Ivan said...

Great catch! I'll update the post with your improved regex.
Thanks for pointing it out!

Jan Hebnes said...

For general information this bug is still present in Sitecore 6.5 rev 111123

Jan Hebnes said...

This bug is now listed as a know issue by Sitecore, http://sdn.sitecore.net/Products/Sitecore%20V5/Sitecore%20CMS%206/ReleaseNotes/KnownIssues%20Recommended/Internal%20links%20from%20Rich%20Text%20fields%20making%20published%20URLs%20absolute%20instead%20of%20relative.aspx