Wednesday, April 14, 2010

JODConverter 3.0 + OpenOffice + CF8

I stumbled across a new version of the JODConverter a few weeks ago, and finally got around to testing it. Version 3.0 has some interesting new features over the previous version. For one thing, it no longer requires running OpenOffice as a service. The JODConverter can start an instance of OpenOffice on demand, like CF9. Both socket and named pipe connections are supported on windows. There are also new features like pooling and automatic restart, geared towards "improving the reliability and scalability of working with an external OOo [OpenOffice] process.". So I thought it was definitely worth a look (for Word to HTML conversions)


Basic Conversion Example
I decided to test it under CF8 first. Since I was not running OpenOffice as a service, the first thing I needed to do was fire up a new instance of OpenOffice. Now to start or stop OpenOffice, you will need an OfficeManager object. The simplest way to create one is using the DefaultOfficeManagerConfiguration class. Just create a new configuration object, then call the buildOfficeManager() method. This will create and return a new OfficeManager object pre-configured with the default settings, which should work in most environments. Then call OfficeManager.start() to kick-off the OpenOffice process.


<cfscript>
    Config  = createObject("java", "org.artofsolving.jodconverter.office.DefaultOfficeManagerConfiguration").init();
    Config.setOfficeHome("C:\Program Files\OpenOffice.org 3\");
    Manager = Config.buildOfficeManager();
    Manager.start();
</cfscript>

To convert a document, you first need to create an OfficeDocumentConverter object by passing in your OfficeManager. Then simply call the convert() method with your two files (ie input and output). Finally, use the OfficeManager to stop the OpenOffice instance. That is it.

<cfscript>
    inPath = "c:\docs\myTestDocument.docx";
    outPath = "c:\docs\myTestDocument_Converted.html";

    OfficeDocumentConverter = createObject("java", "org.artofsolving.jodconverter.OfficeDocumentConverter");
    converter = OfficeDocumentConverter.init( Manager );
    input = createObject("java", "java.io.File").init( inPath );
    output = createObject("java", "java.io.File").init( outPath );
    Converter.convert(input, output);
    WriteOutput("Output file created: "& output);

    Manager.stop();
</cfscript> 

Obviously it would be silly to start and stop OpenOffice every time you needed to do a conversion. So a better alternative might be to initialize the OfficeManager once in your Application.cfc, and reuse it. Perform the initialization code in onApplicationStart and add your OfficeManager to the application scope. Then do the cleanup code (ie stopping OpenOffice) in onApplicationEnd. Other than grabbing the OfficeManager from the application scope first, the conversion code is exactly the same.

Application.cfc (Not including error handling, logging, etcetera)
<cfcomponent>
 <cfset this.name = "jodconverterSample" />
 <cfset this.applicationTimeOut = createTimeSpan(0, 1, 0, 0) />

 <cffunction name="onApplicationStart" returnType="void">
  <cfset var Manager = "" />
  <cfset var Config  = "" />

  <!--- start an instance with the default settings --->
  <cfset Config  = createObject("java", "org.artofsolving.jodconverter.office.DefaultOfficeManagerConfiguration").init() />
  <cfset Config.setOfficeHome( "C:\Program Files\OpenOffice.org 3\" ) />
   <cfset Manager = Config.buildOfficeManager() />
   <cfset Manager.start() />

  <cfset application.OfficeManager = Manager />

 </cffunction>

 <cffunction name="onApplicationEnd" returnType="void">
     <cfargument name="appScope" type="any" required="true" />
  <!--- stop the instance --->
  <cfset appScope.OfficeManager.stop() />
 </cffunction>
 
</cfcomponent>

Settings
Earlier, I mentioned there are different types of connections and managers. The DefaultOfficeManagerConfiguration creates a socket connection on port 2002 by default. But you can use the available methods to change the port number, connection type, and a bunch of other settings. For example, you could create a named pipe connection instead. Just set the appropriate configuration properties before creating the OfficeManager.

<cfscript>
    Config  = createObject("java", "org.artofsolving.jodconverter.office.DefaultOfficeManagerConfiguration").init() />
    Config.setOfficeHome( "C:\Program Files\OpenOffice.org 3\" ) />
    // use named pipe connection 
    Protocol = createObject("java", "org.artofsolving.jodconverter.office.OfficeConnectionProtocol");
    Config.setConnectionProtocol( Protocol.PIPE );
    Config.setPipeName( "myApp_jod_pipe"  );
    // kill any task that takes longer than 2 minutes 
    Config.setTaskExecutionTimeout((2 * 60 * 1000));
    Manager = Config.buildOfficeManager();
    // ....
</cfscript>

You could also connect to an external instance that is already running using the ExternalOfficeManagerConfiguration class instead. Though when working with an external process (ie controlled elsewhere ), obviously you do not start() or stop() it. Just connect to it.

Web Application / Servlet
Though JODConverter can be used strictly as a java library, there is also a sample web application/servlet available. It is not currently part of the distribution jar, but you can find it under the project source tab. Just be aware the web application does not fully handle conversions to HTML out-of-the-box. The servlet will produce a single html file, minus any images. As mentioned in the FAQ's, that is by design. The best way to handle images really "..depends on your particular requirements". So the implementation of image handling is deliberately left up to you. For more details see the FAQ's.

OpenOffice Quirks
Obviously, OpenOffice has some quirks of its own. Though it does a pretty good job with most documents, it is not perfect. It almost certainly will not be able to convert everything you throw at it. So any conversion code should definitely incorporate some solid error handling.

CF9 Quirks
I was very curious to see how well all of the pieces worked together under CF9. As I expected, there were a few quirks.

For whatever reason, things only worked smoothly when the JODConverter's instance of OpenOffice was started after CF9 started its instance. In other words, I had to run a small Word to Pdf conversion first, to force CF to start its OpenOffice process. Then afterward start the JODConverter's instance. When they were not started in that order, all sorts of errors ensued. Both from CF and the JODConverter.

While I had success with socket and external connections, I had zero luck getting a separate named pipe connection to work alongside CF9's instance. Initially I thought it should be possible. But I am not very well versed in UNO or named pipes. So that could just be ignorance on my part. If anyone does know the answer, one way or the other, let me know.

Conclusions
All in all, I was pleased with my initial tests. Though I am still not completely comfortable with the automatic restart feature. I can definitely see its value. OpenOffice can, and on occasion, does crash. But anything that automatically revives itself after death, tends to make me think of zombies ;) So I think I will need to study it (and how to best manage OOO instances) further.

8 comments:

Tony Nelson August 27, 2010 at 10:19 AM  

Do you have any tips or example code showing how to connect to an ExternalOfficeManagerConfiguration? I have OpenOffice running as a windows service on port 8100, but I keep getting a "could not connect to external office process" error. Any help would be appreciated.

Tony Nelson August 27, 2010 at 11:51 AM  

Update: I was able to connect using the following parameters: -headless -accept="socket,port=8100;urp;" -nofirststartwizard

However, now I'm receiving the following error when trying to convert files: "URL seems to be an unsupported one". Have you come across this error before? I've searched online for a bit but haven't found anything too promising.

cfSearching August 27, 2010 at 12:02 PM  

@Tony,

I have not done much with external connections. But if OpenOffice is already running as a service, you could try tweaking the named pipe example above. But instead supplying a "path" and "pipe name", just connect via SOCKET and supply the PORT number

This will probably wrap horribly but ...

Protocol = createObject("java", "org.artofsolving.jodconverter.office.OfficeConnectionProtocol");
Config = createObject("java", "org.artofsolving.jodconverter.office.DefaultOfficeManagerConfiguration").init();
Config.setConnectionProtocol( Protocol.SOCKET );
Config.setPortNumber( 8100 );
...

-Leigh

Tony Nelson August 27, 2010 at 12:21 PM  

My code is pretty similar, but I can't figure out what's going wrong. This will probably format terribly, but here's my code...
























Everything looks OK to me, but I don't know what's going wrong.

cfSearching August 27, 2010 at 12:23 PM  

@Tony,

The blogger loves to remove code that is not html escaped. If you prefer, you can email it to me a cfsearching / yahoo

-Leigh

Tony Nelson August 27, 2010 at 12:34 PM  

Hah. Yeah I figured that might happen. Oh well.

In a move of desperation, I decided to move the files around into different directories. Sure enough, it started working. I'm guessing there might've been a permission issue or a locked directory or something? I don't know. Either way, it (sort of) consistently works now. Thanks for your help.

PS - If you want to see my code, just let me know and I'll email it to you.

cfSearching August 27, 2010 at 12:40 PM  

Anything is possible I guess ;) Feel free to email it, as I am a bit curious about the problem.

-Leigh

Eric Belair July 5, 2013 at 12:28 PM  

This setup works wonderfully on my development PC, but as soon as I move it to one of my servers, it doesn't work. Any thoughts?

  © Blogger templates The Professional Template by Ourblogtemplates.com 2008

Header image adapted from atomicjeep