Troubleshooting Jini Applications


Class Not Found (especially a Stub class)
    This is, by far, the most common type of problem when developing Jini and RMI applications. You'll get a ClassNotFoundException when your application receives a serialized Java object from another program, but the implementation (class files) for this object aren't available locally.

    There are three common causes:

    Codebase problems

      Most often, the problem shows up because of problems with codebase. The server--meaning the program that sends you the object must set the codebase property that tells your program where to fetch the classfile for the object from. If the server program is yours, make sure you're properly set the codebase. See here for more details on codebase.

    You haven't installed a security manager

      Even if codebase is set properly, a Java program won't load any classes remotely if it doesn't have a security manager installed--all classes will be loaded locally only. So be sure you've set a security manager, typically a java.rmi.RMISecurityManager.

    The HTTP server isn't correctly exporting code

      Finally, even if codebase is being set in the sender of the object, and a security manager is installed in the receiver, the code may be installed on an improperly configured HTTP server. If the class that's being loaded is called foo.bar.MyClass, this class should live in root/foo/bar/MyClass.class, where root is the web server's root directory. (The codebase in this case would be simply http://your_machine:port/; be sure to note the trailing slash.)

      Alternatively, if you bundle your code into a JAR file, you'd typically put this JAR in the web server's root and set the URL to http://your_machine:port/your_jar_file.jar.

AccessControlException
    Bad policy file
      Almost always this error is caused by having a bad--or no--policy file. Remember that if you install a security manager, then by default your program will be able to do very little. The security manager controls what not only downloaded code is allowed to do, but your "local" code as well.

      To configure what your code is allowed to do, you must set a security policy file. Make sure that your policy file allows DiscoveryPermission for the groups you're searching for, or AllPermission if you want to turn off all security (which is a bad idea for a production environment).

Proxy object is null
    Service doesn't exist
      There are two ways you can get a null proxy back from a lookup service. First, if you use the simpler form of the lookup() method (the one that doesn't take a maximum number of "hits"), the method may return a null proxy simply because there is no registered service matching the search criteria you provided.

    Code isn't being downloaded properly

      Second, if you know that there is in fact a matching service registered, chances are you're messing up either codebase or your security permissions. For example, if you use the more complex version of lookup() that take the maximum number of matches, and you get back a ServiceItem object but the proxy is null, then you know something is misconfigured.

      What this means is that your client is not able to download the code needed to reconstitute the proxy. This can be because you haven't installed a security manager (or have set a bogus policy), or the service that registered the proxy didn't set its codebase correctly, or codebase was set but is pointing to a bogus location on a web server. See the "Class Not Found" hint at the top of this page for more details.

No lookup services found
    Lookup service isn't a member of your group
      Be sure you're searching for at least one group, and that there are lookup services on your network that are members of this group. Most commonly, applications and services will use the "public" group, which is named by the empty string. To get reggie to join this group, though, you have to pass the string public to it.

    Lookup proxy not properly exported

      Remember that clients and services access the lookup service through its proxy, just like any other Jini service. So the lookup service's proxy must be available remotely; this means that the lookup service must set a codebase, and that the proxy object must be available on a web server (this proxy lives in the reggie-dl.jar file in Sun's sample implementation).

    Lookup service can't start because activation daemon not running
      Sun's sample implementation of the lookup service relies on the RMI activation daemon. The rmid program must be running before you start the lookup service. It can take a while for rmid to get cranking, so give it a few seconds.

    Security problems prevent lookup proxy from being downloaded

      Remember that a Java program will download no code unless it has a security manager set, and this includes the lookup service's proxy. So be sure to set a security manager and use a proper security policy file as described above.

    You're being bitten by the StartServices GUI bug

      Remember that there's a bug in the StartServices GUI on the Windows platform that causes it to generate bogus codebase URLs (the GUI code incorrectly uses the Windows file separator character (backslash) instead of the forward slash that is always required in URLs). See Chapter 1, page 24 for details. On Windows you have to run the lookup service by hand, at least in the current release of the code. What's happening here is that your program can't download the code for the lookup service's proxy object (since the StartServices GUI started it with a bogus codebase), and so the LookupDiscovery class never calls you since it can't deserialize the registrar proxy.
Leases aren't being renewed when using the LeaseRenewalManager
    Using renewFor instead of renewUntil (lease.any only in renewuNTIL
      Most of the time, client leases are renewed by instantiating a LeaseRenewalManager and calling renewUtil() with the parameter Lease.ANY. This means to continue to renew the lease indefinitely until told otherwise (or when the LeaseRenewalManager goes away, which will happen when the client exits).

      One very common problem is confusing renewUntil() with renewFor(). The renewFor() method does not have this continuous renewal behavior, and passing Lease.ANY to it does essentially nothing.

You don't receive any RemoteEvents
    Listener isn't properly exported
      If you've properly created an implementation of RemoteEventListener (subclassing UnicastRemoteObject, etc.) but you never get called with any remote events, chances are you're not properly exporting your listener class.

      Make sure that your program sets codebase properly (see above) and puts the listener class and any supporting classes in an accessible location. Usually this means a web server. What's typically going on in this case is that the service you're talking to is going through the "Class Not Found" issues described above.

Lookup services raise RemoteException
    You haven't called discard
      Once you've retrieved the proxy for a lookup service, there is no ongoing communication with the service unless you explicitly call a method on it. So there is nothing to "tell" you that a lookup service has died, and nothing to automatically clean up after a failed lookup service.

      So when you detect that a lookup service has gone awry--because attempts to communicate with it cause RemoteExceptions--you have the responsibility to do the clean up in your own code. The way you do this is by calling discard() on your LookupDiscovery object. This causes the discarded() method on your DiscoveryListener to be invoked.

Performance problems in services
    There are lots of potential areas that can cause performance problems in services. Here's one common one; I'll add to this list in the future.

    Call remote methods in a separate thread

      Remember that if your service invokes a remote method--say by calling out to a RemoteEventListener in a client--this method invocation will block until the client returns. If the client hangs, or just takes a long time to complete, then the service will be hung as well.

      The solution to problems like this is to have remote methods run in separate, short-lived threads. You can create a small "wrapper" thread just to invoke the remote method.

Leases aren't properly handled after a service reactivates
    Problems with lease serialization format
      Even though leases must be transmitted "over the wire" in relative time format, when they are saved to disk they should be saved in absolute time. Use the setSerialForm() method on the Lease interface to control this.
Leases aren't being renewed when using the LeaseRenewalManager
    Using renewFor instead of renewUntil (lease.any only in renewuNTIL
      Most of the time, client leases are renewed by instantiating a LeaseRenewalManager and calling renewUtil() with the parameter Lease.ANY. This means to continue to renew the lease indefinitely until told otherwise (or when the LeaseRenewalManager goes away, which will happen when the client exits).

      One very common problem is confusing renewUntil() with renewFor(). The renewFor() method does not have this continuous renewal behavior, and passing Lease.ANY to it does essentially nothing.

You get a "set socket option failed (code=100055)" message when running any program that does discovery
    Make sure you have administrator privileges on Windows
      Alert reader Konstantin Laufer reports that Windows NT and 2000 require you to have administrator privileges to do multicast discovery. Unicast discovery should work fine, but you'll get nasty errors when trying to activate a lookup service, for instance. Try logging in as administrator, or get your support folks to assign local administrator rights to your normal account. (As an aside, this seems like an unbelievably stupid Microsoft "feature," but the reports seem to be accurate...)
You get an ActivateFailedException when starting reggie on JDK1.3
    Activation security changed in JDK1.3
      JDK1.3 added the notion of an "exec security policy," which controls how activation group JVMs are launched by rmid. Starting with 1.3, you need to provide some additional security guidelines to rmid if you want something other than the default behavior. And, since reggie is basically just a "wrapper" program that registers with rmid, it bumps into this new behavior when you're running under 1.3. To work around the problem by simply getting the "old" 1.2-style behavior of rmid, you need to launch rmid with the following option:

        rmid -J-Dsun.rmi.activation.execPolicy=none

      For full details, see this writeup of exec policies in 1.3.
Services continually appear and disappear when using the ServiceDiscoveryManager
    This is one of the most common problems developers run into when they start to use the ServiceDiscoveryManager. There are two likely causes of this behavior:

    You need to make sure you're correctly exporting the SDM's event listener

      The ServiceDiscoveryManager registers remote event listeners with the lookup services with which it is interacting. It is your responsibility to make sure that this event listener is correctly exported. Typically, you will do this by bundling up a JAR file containing the necessary classes (net.jini.lookup.ServiceDiscoveryManager$LookupCacheImpl$LookupListener_Stub.class and net.jini.core.event.RemoteListener), place this JAR file in the filespace of an HTTP server that can export it to callers, and set a codebase URL that points to this JAR file on the web server. Without doing this, the ServiceDiscoveryManager will be unable to correctly monitor the services in the community.

    The service doesn't correctly implement equals() and hashCode()
      The ServiceDiscoveryManager considers two services to be "equal" if their proxies equals() methods return true. This is so that the ServiceDiscoveryManager can determine when a service's proxy implementation changes, and report that a new version of the service has appeared. It is your responsibility to correctly implement equals() on your service's proxy. Typically, you will override the method to return true if two proxies refer to the same back-end service. Be sure to note that the Java libraries expect that objects that are equals() to one another to return the same values from hashCode(). So you will have to override this method too.

      If your proxy is simply an RMI stub object, then the equals() and hashCode() implementations on stubs will already have the correct behavior. If your proxy is a "smart proxy" that "wraps" an RMI stub, you can likely just delegate calls to equals() and hashCode() to your internal RMI stub.

You're developing on a machine without a network connection
    Problems while developing without a network connection seem to arise most often on the Windows platform (surprise...). I'm not a Windows networking guru, so I'd suggest taking a browse of the JINI-USERS mailing list to see what the current state of the art is with regard to solutions.

    Having said that, though, alert reader Frank Kmiec reports that it's possible to get things running by ensuring that all parties (rmid, the lookup service, and of your clients and services, the web server, etc.) all are using the loopback interface. To set this up, pass the property -Dnet.jini.discovery.interface=127.0.0.1 on the command line. Also, use the loopback IP address (127.0.0.1) instead of the hostname everywhere you have the opportunity to do so.

    Be aware that in general, using the loopback address is a terrible idea when you actually have a network, and especially when you're deploying. But this may help for those folks developing sans network. No guarantees, your mileage may vary.

Go back to Jini Planet

Keith Edwards
kedwards@kedwards.com


Copyright 1999, W. Keith Edwards