Saturday, June 4, 2011

Removing HTML tags using XSLT.

We have faced issues many times in XSLT when we require just to show text and no html content. So here is the solution. Call the template "remove-html" with passing the content and this will do it.




<!-- Calling the template that removes tag -->
<xsl:call-template name="remove-html">
<xsl:with-param name="text" select="{HtmlBody}"/>
</xsl:call-template>
<!-- This will remove the tag -->
<xsl:template name="remove-html">
<xsl:param name="text"/>
<xsl:choose>
<xsl:when test="contains($text, '&lt;')">
<xsl:value-of select="substring-before($text, '&lt;')"/>
<xsl:call-template name="remove-html">
<xsl:with-param name="text" select="substring-after($text, '&gt;')"/>
</xsl:call-template>
</xsl:when>
<xsl:otherwise>
<xsl:value-of select="$text"/>
</xsl:otherwise>
</xsl:choose>
</xsl:template>

13 comments:

  1. Thanks. It helped me.

    ReplyDelete
  2. Thank you. This is really useful.

    I'm still having difficulty removing HTML elements with attributes. Do you have any hints for this?

    e.g.
    foobar

    ReplyDelete
  3. I have a @Description item , I want to remove all the tags before I display it ...... finding difficult to apply the remove tage template........ Any Help............ Mohan

    ReplyDelete
  4. So where does the top part go? in the ContentQueryMain.xsl or somewhere else? The bottom section obviously goes int the ItemStyle.xsl. Thanks.

    ReplyDelete
  5. Thank you for the post! It seems this will help as I am trying to rollup the latest discussion board entries from a variety of project sites using a CQWP; however, I am a self-admitted neophyte to XML development/maintenance and would appreciate a little help.

    I have the CQWP Group Style set to 'Default' and the Item Style set to 'Title and Description'. In SP Designer, I opened the ItemStyle.xsl and pasted the call code posted here immediately below the "div class="description"" start tag nested in the Default template. I then pasted the code starting with "xsl:template name="remove-html"" below the Default template's end tag (not nested). I save and refresh the CQWP page and the HTML markup is still displayed. (Note the tag delimiters have been replaced with double quotes).

    Is there something obvious that I'm missing? Any guidance is greatly appreciated!

    ReplyDelete
    Replies
    1. Here's how I did it:

      After adding the template as you did, I found the section for the Item Style I am using, the one named "No Image" (I selected 'Title and Description in the CQWP presentation properties). In the template, I copied the DisplayTitle variable section, named the new variable DisplayDescription, changed the call-template to "remove-html", and replaced the two parameter lines with one named "text" and passed it "@Description". Finally, in the description div at the bottom of the NoImage template, I changed the xsl:value-of select to $DisplayDescription (our new variable).

      Hope that helps!

      Jordan

      Delete
  6. Thank you for the post.

    The template is useful, but the html format get lost.

    What if I want to keep the format and transform it?

    For example: " <b>this text should be bold</b>

    ReplyDelete
    Replies
    1. I mean that was the goal just to remove the html. from the input but you can again concat strong tag.

      Delete
  7. Is there a way to strip an html tag, it's sub tags, and all text between them?




    Remove the header tag and everything in between

    ReplyDelete
    Replies
    1. Paul,

      You can do anything using XSL. This was just an example for starters.

      Thanks,
      Maulik Dhorajia

      Delete
  8. I tried to extend this template to keep superscript and subscript i.e remove all markups except sup and sub. But when I tried I am running into StackOverflowException, can you please guide with some functions/technique to acheieve this.

    ReplyDelete