unganisha.org
Home :: The Workshop :: Querying the Google API from Domino

Querying the Google API from Domino

If you have reached this page directly, it might be a good idea to read this first.

To access the Google API we need a SOAP client, soapware.org has a list of various SOAP implementations. Unfortunately most of the implementations seem to make use of JDK 1.4 / 1.3 and use a newer version of the Xalan XML parser (originally derived from the XML4j parser that ships with Domino). Using these newer implementations requires major brain surgery to Domino 5's JVM installation.So I gave up the idea of using any of the existing implementations.

SOAP-interop cannot be done easily with LotusScript, as it doesn't directly support remote HTTP connects.

I finally ended up writing the SOAP-interop myself (kinda like re-inventing the wheel -- but there is no better way to understand how SOAP works than by writing the interop yourself). What you will see here is a crude SOAP client implemented as a java agent, which makes use of the Domino 5 XML4j parser to parse out the soap messages. Writing a SOAP interop is not very difficult ; it basically involves sending sending an XML file in a particular format as an HTTP post to the web service.

To access the Google APIs from Domino some of the steps involve:

  1. Building a SOAP request for the API you want to access
  2. Creating an HTTP Connection to the google server
  3. POSTing the SOAP request to the google server
  4. Receiving the response returned by the google server
  5. Parsing the returned SOAP response, using an XML parser

Building a SOAP request for the Google API

This step is very easy as google have provided templates for all the SOAP requests. Look in the "soap-samples" folder in the Google toolkit, all the API request and response formats are provided as XML files. For this example, lets use the simplest API doSpellingSuggestion. This API submits a spelling to Google which in turn returns a suggested spelling (if it exists).

Don't worry about all the <SOAP...> tags for now, the two lines that are of interest to us are:

<key xsi:type="xsd:string">h1rrrfRQABCDEFGH+jQ0WZKbAXjDFBtY</key>
<phrase xsi:type="xsd:string">britney speers</phrase>

The first line with the <key> element, contains the Google license key that allows you to access the GoogleAPI. This is a mandatory parameter for ALL the Google APIs. See the Google API website for details about obtaining a free license (B.T.W what you see here is a dummy key).

The second <phrase> element, contains the spelling that we want checked against the Google API. In this example, we check the spelling for "britney speers".

Create an HTTP Connection to the google server

I initially used the URLConnection class provided by the java.net.* library set, to connect and send a HTTP-POST to the Google web service. After testing the code for a while, things seemed to be OK. However on some occasions, when I tried shutting down the Domino server the HTTP task refused to shut-down. This was followed by the following message appearing constantly on the server console:

HTTP Waiting For Thread: Thread: [604] State

This behaviour seemed to happen occassionally and very randomly.After searching for a while on the LDD-Forum, I still didn't have a solution.

On a hunch that I may not have made the URLConnection properly, I found that the URLConnection and HttpUrlConnection classes in the JDK 1.1, 1.2 and 1.3 releases have many stability and resource leakage issues.

URLConnection woes

Sun Java forum thread -- discusses some of the problems with URLConnection
Sun's URLConnection Cannot Be Reliably Timed Out -- proposes some solutions to the URLConnection time-out problem
Experts-exchange forum thread -- discusses an alternate way of solving the time-out problem by using threads.

Some of the suggested hacks to resolve this issue involve, modifying the JDK directly, connecting in a separate thread instance (so the thread can be killed instead of the connection) . The Domino HTTP problem occured because the underlying tcp/ip socket of the URLConnection randomly remained open in spite of the object going out of scope , especially during instances when the connection speed was low.

Instead, I decided to look at an alternate URLConnection implementation that didn't have these problems. One of the recommended ones was the Apache Commons HTTPClient . However, I didn't find this suitable as the timeout functionality that solves the problem, is available only in the nightly builds and not in a stable release.

So I settled on using the Sun Brazil framework. This can be downloaded from here. The Brazil framework comes with a HttpRequest class, which provides similar functionality to URLConnection and supports setting of timeout values. Unlike the jakarta HttpClient , Brazil is a stable 2.0 release and is also available as a JDK 1.1 binary (thats the version of the binary I used).
(Note: I use just the Http client related classes in the Brazil framework)

Download the file brazil2.0-jdk1.1.jar from the Brazil website. Convert this JAR file to a script library (For instructions on how to do that, read the section titled Creating a ServletSupport Script Library in this article on codestore .)

Once that is done, we can begin use HttpRequest directly within a java agent.

The following code initializes the HttpRequest object and prepares it for connection:

String strGoogleSoap = new String("http://api.google.com/search/beta2");
// connect to the google url
HttpRequest connection = new HttpRequest(strGoogleSoap);

POSTing the SOAP request to the google server

We need to send the XML SOAP request as an HTTP "POST" to the Google web services URL. The HttpRequest class, provides a function to set the HTTP request method.

connection.setMethod("POST");

Next we load the SOAP request into a String variable:

String strSoapRequest = getSoapRequestDocument("doGoogleSearch");

The getSoapRequestDocument() fetches the SOAP XML request from a notes document :

In this example, strSoapRequestMethod is doSpellingSuggestion, and the XML stored in the document is identical to what was shown in the previous step.

SOAP requests require to be sent as XML., so we set the content header as text/xml:

connection.setRequestHeader( "Content-Type", "text/xml");

Now post the XML to Google by writing to the output stream of the connection, we make use of the java.io.OutputStreamWriter class:

OutputStreamWriter oswObj = new OutputStreamWriter(connection.getOutputStream(),
"UTF-8");
oswObj.write(strSoapRequest);

Remember to flush and close the output stream when you finish:

oswObj.flush();
oswObj.close();

Finally, initiate the connection to Google by calling connect() method of the HttpRequest object:

connection.connect();
We use the default timeout settings provide by HttpRequest.

Receiving the Response from the Google server

Google will now send a response to our request. If the request was made properly, Google will send back the correct spelling : "britney spears". We use the HttpInputStream class of the Brazil framework to read the input stream line-by-line and copy it into a StringBuffer variable. The InputStream classes provided by the standard JDK could have been used, but HttpInputStream is more convenient. Finally, remember to close the input stream.

Parsing the returned response

We now have the SOAP response returned by google in a string variable. The response is returned as an XML file:

The <return> element contains the correct spelling returned by Google.

I used the IBM Xml4j parser that ships with all Domino 5.0.3+ versions to parse the SOAP response.However, any XML parser can be used.

The parser requires an XML InputSource object to build the DOM tree. We convert the String containing the SOAP response to an InputSource, by first converting the string to a character array, and then finally converting the array to an InputSource object

Next, we set up the parser by creating a parser object and initializing it with the XML InputSource object. The parser's getDocument() returns a handle to the generated XML DOM tree.

The correct spelling is returned in the <return> element. We locate the return element using the getElementsByTagName() function of the parser. The text value (i.e. britney spears) is a child node of this element. We access the child node from the child nodes list of the <return> element.

More Advanced Usage

All the google APIs can be queried using the described method. While the method to connect and receive responses via SOAP will remain the same, the way the returned response needs to be parsed will vary from API to API.

With a bit of work, the Google API can be made to do more useful stuff. For instance, on this website I have a section in the sidebar (on the right) called "The Google Flavour of the Day".The set of links is basically the top 5 hits returned by Google for a keyword (a.k.a the flavour) on that particular day (strangely enough, the list is slightly different every day).

I use a Java agent that executes once a day and queries google on a specified keyword. The agent then parses through the search results (returned by the doGoogleSearch API) and generates HTML output for the search results. This output this then copied into a notes document. Finally, I use @dblookup to lookup the HTML from the document and display it on the web page.

The doGoogleSearch API returns results in a XML format thatlooks like this : doGoogleSearchResponse.xml (zipped) . The parsing required to extract the results in HTML format from this is slightly more complex than extracting the correct spelling response. If you are unfamiliar with using an XML parser, the XML4j documentation and samples should get you going.

A part of the XML parsing is shown below:

Sample Code

Download a sample Notes5 database containing the required script libraries and java agent code. GoogleDb.zip

About the sample

There are two sample agents provided in the googledb.nsf database. To run any of these agents you need to get a google license key and update the license code within the <key> element of the Soap Requests view in the database.

The googleFlavourAggregator agent

This agent is pretty similar to the implementation found on my website's sidebar. You can execute it from a browser as : /googledb.nsf/googleFlavourAggregator?openagent or as a scheduled agent. This agent updates the search result into the googleSearch document in the Search Aggregator view.

To modify the google search keyword, you will need to make the appropriate changes to the doGoogleSearch document in the Soap Requests view.

The google spell checker

This is an interactive agent that queries Google for spelling checks. You can execute it from a web browser : /googledb.nsf/googlespellcheck?openpage, by entering an incorrect spelling and pressing submit.

Domino agent security restrictions

Since the java agent makes a remote network connection, it requires Unrestricted Agent access, to execute properly (You will need to sign it with a user-id that has Unrestricted access). In some cases (typically on hosted websites) this requirement can be a constraint . However, in instances where there is no interactive access of the google services (as in the case of google flavour of the day, we can overcome the restriction problem using replication.

Most people running domino websites have a local server installation from which they replicate to their hosting provider's server. In such a scenario, the agent accessing google can be given Unrestricted access on your local server. Once that is done, a periodic scheduled replication can be set up between the local server and the hosting server which replicates the content produced by the agent. Clear as mud eh?!

Some additional references that might be useful:
Soapware.org -- SOAP 1.1 directory for developers
Microsoft developer network -- A section on the MSDN site related to SOAP
IBM developerworks -- IBM developerworks section on web services and SOAP
SOAPconnect for LotusScript -- this is a sample created by IBM and is found in the lotus sandbox, it uses a combo of java and lotusscript code. Unfortunately using this method requires a "lobotomy" of your Domino server and client JVMs.
Sample using SOAPconnect -- This is another sample from the sandbox that uses SOAPconnect.