NUN.ME.SHU logo
NUN.ME.SHU banner
XML/XSL/CSS architecture
Switch to printer friendly page layoutView XML source of this pageNun.Me.Shu - hosting environment

Information about the configuration of Tomcat and Cocoon2 which are used to host the Nun.Me.Shu architecture.



Components in development environment
The Nun.Me.Shu framework is a completely standard implementation of the basic XML and XSL specifications. No hosting-specific extensions are used. It can therefore be deployed in any environment that can render XML with XSL.
The Nun.Me.Shu web site is hosted in a standard Java environment with Cocoon2 XSLT servlet and Tomcat servlet container. Both are part of the Apache community of open-source software projects, and can be downloaded for free from the Apache web siteNew page displays in separate browser window.

The hosting components in my development environment are:
  • Java JVM 1.2.2New page displays in separate browser window
    A Java Virtual Machine is needed for Tomcat and Cocoon.
  • Cocoon 2.0.3New page displays in separate browser window
    Cocoon is an XML publishing framework, designed for performance and scalability around pipelined SAX processing, with a sophisticated caching system. Cocoon is a Java servlet and thus depends on a servlet engine, like Tomcat.
  • Tomcat 4.0.4New page displays in separate browser window
    Tomcat is a servlet engine, which can also be used as a stand-alone web server. In a development environment, using Tomcat as a web server is convenient and just fine.
  • Xalan-J 2.2.0New page displays in separate browser window
    Xalan is the XSLT processor used by Cocoon. My download of Cocoon came with a buggy version of Xalan (see the history page), forcing me to separately download and install an older stable version of Xalan.
  • Apache web serverNew page displays in separate browser window
    Apache is a stable, robust web server. I don't have it installed in my development environment since Tomcat doubles as a web server there. In a production environment however, Tomcat should be used just as a servlet engine, and Apache should be used as the web server.

Since the use of XML/XSL in the development of web sites is still rare, I'll describe the configuration of my development environment in some detail here. Tomcat is the web server and servlet engine, Cocoon2 is the XSLT processor.
Documentation for the individual components is extensive and comes with their downloads, but I ran into some relatively simple problems that took a while to solve because I couldn't find an overview of the whole system as described here.
Note that if you use this description as a guideline for your own installation, and you download a version of Cocoon2 that is newer than described here, you might not have to separately download and install Xalan. I only had to do that because the version of Xalan that came with my version of Cocoon2 was buggy.



Installation of Tomcat and Cocoon2
My development environment is a Sony Notebook with Windows 2000. Configuration details below are based on the Windows operating system. The live web site is hosted on a Linux operating system. Besides from a slightly different directory structure (and the additional installation of the Apache web server) the configuration on Linux is pretty much the same as outlined here.

Download
Downloading of the Java JVM, Tomcat binary distribution, Cocoon2 binary distribution and Xalan-J binary distribution, and installing it on your local machine is simple and straightforward (simply unzip the downloaded files into their default directory and create a few required environment variables; the installation documentation provides the details).

Install Cocoon under Tomcat
Cocoon's cocoon.war file then has to be copied into Tomcat's webapps directory and the main part of the installation is done. You should now be able to start Tomcat as a web server and access Cocoon's test and documentation pages through your browser. When you start Tomcat it will deploy the cocoon.war file and create a directory named cocoon in its webapps directory.

Create web site and copy basic configuration files
Next create a directory for your web site. Copy the file sitemap.xmap and the directory WEB-INF from Tomcat's webapps\cocoon directory to your web site directory. This contains the basic configuration of Cocoon for your web site, and has to be edited later (see below). Now copy all Xalan jar files from Xalan's bin directory to your web site's WEB-INF/lib directory, and delete equivalent existing versions that were there before (since the goal is to replace the buggy version of Xalan that came with the Cocoon distribution with the older stable version of Xalan that was downloaded separately).
In the linux environment for my live web site, I didn't copy any jar files, but in stead created symbolic links from the web site's WEB-INF/lib directory to the original location of the files.

Directory structure
The relevant part (from a configuration point of view) of the directory structure on my Notebook looks like:
[-] Local Disk (D:)
    [-] usr     (directory containing all applications)
        [-] cocoon-2.0.3
        [-] jakarta-tomcat-4.0.4
            [-] bin
                catalina.bat     (start - stop script for Tomcat)
            [-] conf
                server.xml     (server configuration file)
                web.xml     (default webapp configuration file - untouched)
            [-] temp
            [-] webapps
                cocoon.war
                [-] cocoon
                    sitemap.xmap
                    [-] WEB-INF
        [-] jdk1.2.2
        [-] xalan-j_2_2_0
            [-] bin
    [-] web     (directory containing all web sites)
        [-] nunmeshu
            sitemap.xmap     (Cocoon's pipeline processing configuration file)
            [-] WEB-INF
                cocoon.xconf     (Cocoon's main configuration file)
                logkit.xconf     (Cocoon's log configuration file)
                web.xml     (local version of Tomcat's webapp configuration file, with Cocoon servlet instructions)
                [-] lib
                    cocoon-2.0.3.jar
                    xalan.jar
                    (..and-many-more..).jar


Configuration of Tomcat
After the basic installation of Tomcat and Cocoon is done (and tested, by starting the server and accessing Tomcat's and Cocoon's documentation through your browser) you'll have to configure both to be able to render XML pages from your web site.

Edit virtual host configuration
By default, Tomcat expects applications and web sites to reside in its webapps directory. But I wanted to be independent of whatever web server or servlet engine I was using, and thus I kept my web pages in their own directory structure (outside the Tomcat directory). Tomcat supports this but it requires editing of its server.xml file (in Tomcat's conf directory). Within the default virtual host element <Host name="localhost" ...> of server.xml I added/edited the following lines [view complete file]New page displays in separate browser window:
<!-- nunmeshu site: at Tomcat Root Context (#EDITED#) -->
<Context path="" docBase="D:/web/nunmeshu" debug="0" reloadable="true" />

<!-- redefine the path for the TOMCAT documentation
    (which by default is at the root) (#ADDED#)-->
<!-- !!the path "/tomcat" is also added to the Tomcat Contexts below!! -->
<Context path="/tomcat" docBase="ROOT" debug="0" reloadable="true" />

<!-- Tomcat Manager Context (#EDITED#) -->
<Context path="/tomcat/manager" docBase="manager" debug="0" privileged="true"/>

<!-- Tomcat Examples Context (#EDITED#) -->
<Context path="/tomcat/examples" docBase="examples" debug="0" reloadable="true" crossContext="true">
Note that by default, Tomcat's documentation is at the root of the localhost, but I wanted the nunmeshu web site there. The second Context element in above code snippet therefore redefines the path for the Tomcat documentation.

With this change I can now access the following local sites through my browser:
http://localhost:8080/            :     nunmeshu web site
http://localhost:8080/tomcat/     :     Tomcat documentation
http://localhost:8080/cocoon/     :     Cocoon2 documentation

Edit servlet deployment descriptor for Cocoon
After the basic installation, outlined above, your web site's WEB-INF directory contains a web.xml file (copied from Cocoon). This is the servlet deployment descriptor for Tomcat, which specifies parameters for the Cocoon servlet, specific for your web site. It is used by Tomcat in addtion to its default web.xml file (in Tomcat's config directory). I left the default parameters in the <servlet> element untouched but changed the part below that. I defined servlet mapping based on file extensions, in stead of based on a directory path, since I want Cocoon only to handle XML-related files, and leave Tomcat to serve images, CSS files and such. I also added a mime-mapping for extension xml-source, which is used in the Nun.Me.Shu architecture next to the regular xml extension. Here are my edits [view complete file]New page displays in separate browser window:
<!-- EM - In stead of directory mapping, use file type mapping. -->
<!-- This way Cocoon will not serve static content. -->
<!--
<servlet-mapping>
   <servlet-name>Cocoon2</servlet-name>
   <url-pattern>/</url-pattern>
</servlet-mapping>
-->
<servlet-mapping>
   <servlet-name>Cocoon2</servlet-name>
   <url-pattern>*.xml</url-pattern>
</servlet-mapping>
<servlet-mapping>
   <servlet-name>Cocoon2</servlet-name>
   <url-pattern>*.xml-source</url-pattern>
</servlet-mapping>
<servlet-mapping>
   <servlet-name>Cocoon2</servlet-name>
   <url-pattern>*.xsp</url-pattern>
</servlet-mapping>
<servlet-mapping>
   <servlet-name>Cocoon2</servlet-name>
   <url-pattern>*.svg</url-pattern>
</servlet-mapping>

<mime-mapping>
   <extension>xml</extension>
   <mime-type>text/xml</mime-type>
</mime-mapping>

<mime-mapping>
   <extension>xml-source</extension>
   <mime-type>text/xml</mime-type>
</mime-mapping>

<welcome-file-list>
   <welcome-file>index.xml</welcome-file>
   <welcome-file>index.html</welcome-file>
   <welcome-file>index.htm</welcome-file>
   <welcome-file>index.jsp</welcome-file>
</welcome-file-list>

Delete temporary files
Tomcat has the habbit of creating a lot of temporary files (28MB) each time it is started, without deleting them when the server is stopped. This can quickly fill your hard drive if you are not aware of it. I therefore appended some code to Tomcat's stop-start script, catalina.bat (in Tomcat's bin directory), that will delete these temporary files, and in addition clear the Cocoon cache [view complete file]New page displays in separate browser window:
rem <EM-empty-tomcat's-temp-directory-and-remove-cached-cocoon-files>
if not "%ACTION%"=="stop" goto realend

echo on
cd /D %CATALINA_TMPDIR%
dir
echo ### delete *.tmp files? (Ctrl-C to abort) ###
pause
del *.tmp

cd /D %CATALINA_HOME%\work\Standalone\localhost\cocoon\cocoon-files\cache-dir
dir
echo ### delete cached PCK*.* files? (Ctrl-C to abort) ###
pause
del PCK*.*

cd /D D:\web\nunmeshu\WEB-INF\work\cache-dir
dir
echo ### delete cached PCK*.* files? (Ctrl-C to abort) ###
pause
del PCK*.*

:realend
rem </EM-empty-tomcat's-temp-directory-and-remove-cached-cocoon-files>


Configuration of Cocoon2
After the basic installation, outlined above, your web site's base directory contains a sitemap.xmap file, and your web site's WEB-INF directory contains a cocoon.xconf and logkit.xconf file (all copied from Cocoon). These are the main configuration files for Cocoon specific to your web site.

Edit sitemap file
The sitemap.xmap file in your web site's base directory is sometimes refered to as the "heart of Cocoon". It contains instructions for Cocoon how to process XML files. The sitemap file is rather big and consists of the following sections: components, views, resources, action-sets and pipelines. I left all but the pipelines section untouched. The pipelines section contains the specific instructions how to process XML content, and thus is important and specific to your web site. I deleted the existing default pipelines sections and replaced it with the following [view complete file]New page displays in separate browser window:
<map:pipelines>

   <!-- ***** NUN.ME.SHU pipelines ***** -->

   <!-- filter out featured sites, and use local sitemap -->
   <map:pipeline>
      <map:match pattern="s/ldde/**.xml">
         <map:mount check-reload="yes" src="s/ldde/sitemap.xmap"
              uri-prefix="s/ldde"/>
      </map:match>
   </map:pipeline>

   <map:pipeline>
      <map:match pattern="s/framework/**.xml">
         <map:mount check-reload="yes" src="s/framework/sitemap.xmap"
              uri-prefix="s/framework"/>
      </map:match>
   </map:pipeline>

   <!-- process xml files -->
   <map:pipeline>

      <map:match pattern="**.xml">
         <map:generate src="{1}.xml"/>
         <map:transform src="xsl/main.xsl">
            <map:parameter name="use-request-parameters" value="true"/>
         </map:transform>
         <map:serialize/>
      </map:match>

      <!-- append "-source" to any xml file, and you will be able
           to view the xml source -->
      <map:match pattern="**.xml-source">
         <map:generate src="{1}.xml"/>
         <map:transform src="xsl/cp.xsl"/>
         <map:serialize type="xml"/>
      </map:match>

   </map:pipeline>

</map:pipelines>
Note that the first two pipelines mount another sitemap. Both featured sites on the Nun.Me.Shu web site have their own sitemap. This was not necessary, but I was curious how it worked, and it allowed me to keep those featured sites completely independent of the main site.
The third and last pipeline instructs Cocoon how to process XML files on the main site. It has two pattern matches: requests for files with names ending in "xml" take that XML file and process it with the main stylesheet; requests for files with names ending in "xml-source" also take the original XML file but now process it with a special stylesheet (cp.xsl) that simply copies the XML source to the output.

Edit configuration file
The cocoon.xconf file in your web site's WEB-INF directory is the main configuration file for Cocoon. I left it untouched, except for commenting out the <hsgldb-server> section, which was causing "port in use" errors when Tomcat was started [view complete file]New page displays in separate browser window:
<!-- EM - start comment out : causes "port in use" error when Tomcat is started. <hsqldb-server class="org.apache.cocoon.components.hsqldb.ServerImpl"
        logger="core.hsqldb-server" pool-max="1" pool-min="1">
   <parameter name="port" value="9002"/>
   <parameter name="silent" value="true"/>
   <parameter name="trace" value="false"/>
</hsqldb-server>
EM - end comment out -->

Edit logkit file
The logkit.xconf file in your web site's WEB-INF directory contains the location of the various log files for Cocoon, and allows for different levels of verbosity of the log messages. I left it untouched, but you might want to turn all logging off if performance is an issue in your environment. [view complete file]New page displays in separate browser window


Configuration of Apache web server
When you have installed Apache as the web server (in stead of Tomcat), it has to be configured such that it will forward requests for XML files to Tomcat/Cocoon. This is specified in Apache's httpd.conf file in an IfModule directive within your site's virtualhost configuration:
<VirtualHost 64.46.100.75>
   ServerAlias www.nunmeshu.com nunmeshu.com
   DocumentRoot /home/nunmeshu/public_html
   BytesLog domlogs/nunmeshu.com-bytes_log
   ServerName www.nunmeshu.com
   ScriptAlias /cgi-bin/ /home/nunmeshu/public_html/cgi-bin/
   CustomLog domlogs/nunmeshu.com combined
   <IfModule mod_jk.c>
      JkMount /servlet/* ajp13
      JkMount /*.jsp ajp13
      JkMount /*.xml ajp13
      JkMount /*.xml-source ajp13
   </IfModule>
</VirtualHost>

Because the site was producing some 505 errors from time to time, and because sometimes the Tomcat servlet engine was down for a longer period of time (both related to the fact that the hosting environment is more or less experimental), I decided to pre-generate the static pages of the site and only render the web services pages real-time. For this I defined a conditional redirect in the .htaccess file in the web site's root directory:
RewriteEngine on
RewriteBase /
RewriteCond %{REQUEST_FILENAME} ^.*\.html$
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule ^(.+)\.html $1.xml

Basically what happens is as follows:
The first RewriteCond tests if the requested URL has a .html extension.
If the first test is successful then the second RewriteCond tests if the requested file does not exist.
If the second test is successful (and thus there is no .html file) then the RewriteRule is executed: this replaces the .html extension with a .xml extension, and thus the XML file will be requested, and rendered real-time.
This allows all web pages to be referenced as .html pages. All static pages of the site are pre-generated as HTML files and thus will be served by Apache. Only the dynamic pages (the web services pages) have no corresponding HTML file, and thus they will be rendered appropriately. Note that the browser address window will still show a filename with a .html extension, but the file that is accessed in reality is the file with a .xml extension.
In short, if a filename with .html extension is present it will be served, otherwise the corresponding file with .xml extension will be rendered:
   - suppose there is a request for foo.html.
   - if file foo.html exists, then Apache will serve the file.
   - if file foo.html does not exist, then Apache will redirect to foo.xml which subsequently will be served by Cocoon.
So, simply by pre-generating or deleting html files one can control if a page is rendered or served from file cache.
Note that the .htaccess file has to be in the web site's root directory, otherwise the RewriteRule will fail (because the filename there is resolved relative to the directory the .htaccess file is in).
For general information about URL rewriting, see Apache's page about the mod_rewrite moduleNew page displays in separate browser window. Note that the code fragment could also have been placed in Apache's httpd.conf file within your site's virtualhost configuration section. That's actual better for performance reasons, but for me it is easier to experiment with the .htaccess file since that is in my local directory (and I don't have write access to the httpd.conf file).