Recursively Copy Elements Using XSLT

Recently I bought a book entitled AJAX by O'Reilly press. I bought this book because it had a small section on XML and XSLT which is what I was truly after, however the bookstore had no books covering that specific topic. I also decided I may be able to learn something about the ajax process/design through this book, even though I already know enough to at least make something work. Perhaps now I can make that stuff work better and faster.

Back to the point, I wanted this XML+XSLT section because ever since I saw Blizzard Entertainment's World of Warcraft Armory I've been fascinated with what they have been able to do with it, and I wanted to try and bring this type of power to my site and my knowledge base (aka my brain). Update 2015-3-1 Blizzard no longer uses this technique on their new armory pages. I did use this technique for a while and while nifty it did not function cross-browser well and had all sorts of additional challenges.

I decided I wanted to try and convert all the pages on my site to an xml document, and then present them using xslt. I figure if I can do this, it will also aid me in creating offline versions of my articles, which is something I've been wanting to do with my set of SpiderMonkey articles. If my thought process is correct, I should be able to just apply a new xslt template and end up with an offline version of these pages once I am finished.

Of course, trying to do this, with something completely new none the less, resulted in some problems. Firstly, I decided the best way to do this and would be to still use XHTML tags to some extent, and I would just copy these over using <xsl:copy-of>, but that did not turn out as well as I planed.

In order to re-use the xhtml tags, I created my initial xml as so:

<?xml version="1.0" encoding="ascii"?>
<?xml-stylesheet type="text/xsl" href="./index.xsl"?>
<page title="Grand Overview" xmlns:xhtml="http://www.w3.org/1999/xhtml">
    <content>
        <entry>
            <xhtml:p>Taking some time to learn XML+XSLT for web page generation.  I've always wanted to learn this ever since I saw what
            <xhtml:a href="http://blizzard.com">Blizzard Entertainment</xhtml:a> was able to do with it on their
            <xhtml:a href="http://wowarmory.com/">World of Warcraft Armory</xhtml:a> site.  It's been a rough start but I think I am starting to
            understand this mess of tags.</xhtml:p>
            <xhtml:p>The first problem I had involved a IE/Firefox compatibility issue, which I believe may have been from IE's lacking the ability
            to understand namespaces.  Due to the way I wanted to setup the pages, I needed the ability to simply copy elements from the base
            xml document to the output document, but IE did not handle this well, so I had to create
            <xhtml:a href="/personal/2008/09/07/xslt_recursive_copy.php">my own XML+XSLT function which will do a recursive copy</xhtml:a>
            of the dom tree.  What fun that was.</xhtml:p>
            <xhtml:p>So far all I have converted is my index page.  I haven't actually replaced my old one yet though.  You can view the new
            (still incomplete, style wise) index page at: <xhtml:a href="/index.xml">http://www.aoeex.com/index.xml</xhtml:a></xhtml:p>
        </entry>
    </content>
</page>

As you can see, I just registered the XHTML namespace for xml and prefixed each xhtml tag with this namespace. This results, however, in the xslt copying over the tag with this namespace prefix, so instead of ending up with a <p> tag in the output document, I get a <xhtml:p>. In firefox, this works fine so I figured "Eh, no big deal. Firefox seems to know what to do with it because of the namespace." When it came time to load the page up in IE however, AHHHH!. One giant paragraph of text and no links. IE apparently does not know what to do with these tags so it just ignores them like it does other unknown tags.

In order to fix this, I figured I'd have to strip the namespace off the tags, as that was the only way I could see to get it working after a few experiments. In order to do this, I had to implement my own <xsl:copy-of> template. This is where the nightmare started.

My lack of understanding about Xpath of course made this difficult. For instance, I knew you got child elements using the child::* path. What I did not realize right away is this includes only element nodes, and not text nodes. I also found out that getting the text of a node using <xsl:value-of> would result in all the text nodes combined, not just the first one. All in all, this resulted in the final document coming out all wrong after my initial trial run at a recursive copy function.

After some searching, I found I could index elements, such as self::node()[1] would get the first node, be it a text or an element. With this new knowledge I tried getting the first node, then copy recursively the next child element, then get the second text node. This seemed to work well until I found a few entries which either had an element node as the first node, or had multiple child elements with text nodes scattered between them (ie a paragraph with multiple links.)

Finally, I realized I could use child::node() to get a list of all child nodes, regardless of their type (text vs element vs comment, etc.) With this knowledge, the entire process became MUCH easier. I now had a list I could simply <xsl:for-each> through. The next task was how to determine a text node vs an element node. I'm not worried about any other type of node.

Determining a text node ultimately turned out to be really simple. With the new ability to use for-each, the current node would change with each loop to the next child, so I no longer had to mess with variables. Thus, I was able to use the simple test expression self::text() to see if the node was a text node. If it was, I just output it using <xsl:value-of>. If not, I would output an element of the proper name using the function local-name(), copy it's attribute nodes, then recursively call the template with the current element to make a copy of any children of the current element.

The final template looks like this:

<xsl:template name="copyChildren">
    <xsl:param name="element" />
    <xsl:for-each select="$element/child::node()">
        <xsl:choose>
            <xsl:when test="self::text()">
                <xsl:value-of select="self::node()"/>
            </xsl:when>
            <xsl:otherwise>
                <xsl:element name="{local-name()}">
                    <xsl:copy-of select="attribute::*" />
                    <xsl:call-template name="copyChildren">
                            <xsl:with-param name="element" select="self::node()" />
                    </xsl:call-template>
                </xsl:element>
            </xsl:otherwise>
        </xsl:choose>
    </xsl:for-each>
</xsl:template>

With that done, I can now start working on the rest of the site.

Happy coding!