Salesforce.com SOAP API Gotchas Pt. 1

This is the first part in a series (see Part 2, Part 3, and Part 4).


Salesforce.com LogoSalesforce.com is a very popular SaaS CRM. An essential task of a CRM system is enabling other applications to integrate smoothly with it, and toward this end, Salesforce.com provides several complementary APIs for use by their customers as well as by their partners (like us). We make heavy use of Salesforce.com’s API as part of integrating our customer’s Salesforce.com organizations into our products, and though the API is in general well designed and carefully documented, there are still a few dark corners that we’ve come across. Over the next few weeks, we’ll be describing some of these issues in more detail, as well as how to work around them, if possible.

SOAP Ain’t Simple

SOAP is a “metaprotocol”: a tool for building your own protocol. You might have a SOAP service that has a method multiply which takes two ints and returns their product. SOAP defines how the communications to and from this service are formatted (namely, as XML over HTTP). This service would be declared in a WSDL file. The WSDL contains XML descriptions of the types of data that each method accepts and returns, as well as any exceptions they may throw. Someone who wished to use your SOAP service could download your provided WSDL and use it to generate code in their preferred language that could interact with your service. The generated code typically abstracts away all (or most) of the underlying HTTP and XML machinery.

Salesforce.com and SOAP

Salesforce.com provides several SOAP APIs, two of which are closely related: Enterprise and Partner. (You can download WSDLs for all of their APIs by logging in to your Salesforce.com account and navigating to Setup → App Setup → Develop → API. If you don’t have a Salesforce.com account, you can create a Developer account for free: click the button for Free Developer Edition Environment on http://developer.force.com/.)

To get started programming with the API in Java, you can use the JAX-WS reference implementation. Once you have the JAX-WS tools decompressed, you can use the wsimport script to generate stub classes from the WSDL you’d like to use:
sh bin/wsimport.sh -p [package for stub classes] -B-XautoNameResolution -d [output directory] [path to wsdl]
If you’re on Windows, wsimport.bat is also provided.

The Enterprise API is defined differently for every single Salesforce.com organization. If your organization has custom objects that mine does not, then the WSDL that you get when you download your organization’s Enterprise WSDL will be different than my Enterprise WSDL because yours would, of course, have your custom objects in it. Because it is so specifically constructed for one organization’s specific Salesforce.com instance, the Enterprise API is best suited for developing tools that only target one Salesforce.com organization.

The Partner API provides access to the same data that the Enterprise API does, but in a more generic way. All Partner API WSDLs are the same: if you download the Partner API WSDL, you’ll get the same file I do. Rather than providing explicit types for every business object the way the Enterprise API does, the Partner API makes all object types accessible as “sObjects”. A Contact record would be represented as a sObject with a type field containing the string “Contact”. The Enterprise API, on the other hand, would contain WSDL types (and thus would yield generated classes) for Contact, Account, Lead, Opportunity, and every other type in your organization. For details, see the API documentation in Standard and Custom Object Basics → Core Data Types Used in API Calls.

API Oddities

Now that the basic terminology has been squared away, here’s a handful of issues we’ve run into with Salesforce.com’s Partner API. Since generated stub code will be used here and there, I’ll refer to those classes with a package of ‘sfstub’ to disambiguate them from other classes.

Inconsistent exceptions for bad password vs bad username

When you make the login() call with a bad password, a sfstub.LoginFault_Exception (the generated class for the WSDL’s LoginFault element) gets thrown. This is the expected, documented behavior. When you make the call with a bad username, though, a generic javax.xml.ws.soap.SOAPFaultException is thrown instead. If you turn on http debugging (com.sun.xml.ws.transport.http.client.HttpTransportPipe.dump = true; for JAX-WS), you can see that the XML is indeed different. The fault for the bad password is transmitted as <sf:LoginFault xsi:type="sf:LoginFault"> while the bad username’s fault is <sf:fault xsi:type="sf:LoginFault" xmlns:sf="urn:fault.partner.soap.sforce.com">. The sf:fault element in the WSDL is defined to have a type of ApiFault as opposed to the sf:LoginFault that is actually sent. I’m not a WSDL expert, though, so I’m not sure if the problem lies with Salesforce.com or with JAX-WS. Either way, it’s something to be aware of, especially since SOAPFaultException is an unchecked exception that you might not otherwise have reason to catch.
Update: This issue has now been fixed! The same exception gets thrown for both cases now.

QUERY_TOO_COMPLICATED

Salesforce.com’s installation of Oracle is apparently configured to allow queries that are at most 64k in length (see this post on their developer forum). This has the consequence of imposing a difficult-to-predict limit on number of fields that can be fetched by a retrieve(). If the fields you’re retrieving are simple fields like numbers, booleans, or dates, then you can retrieve hundreds of them in one call. (I have successfully fetched over five hundred simple fields in one call.) On the other hand, if the fields you’re retrieving are calculated fields, especially ones with complicated formulas, then the formulas are apparently handled by the database layer, as opposed to evaluating the formulas in the application layer. This means that the SQL query is much larger and may hit the 64k limit. There is no way to tell how much SQL is necessary for each field, so unfortunately all that can be done when this happens is to try splitting up the needed fields across multiple retrieve() calls.

Session and logout() semantics

The way Salesforce.com authenticates API calls is by examining a SOAP header that contains a session key. (The session key is set during the login process.) Session keys are allocated on a per-user basis, not on a per-login basis. This means that if you create two connections at the same time (e.g. calling login() from two different threads), both will be using the same session key. Sessions can expire after a period of inactivity or by calling logout(). The inactivity timeout period is configurable in Setup → Administration Setup → Security Controls → Session Settings. (This timeout applies both to API sessions and to the sessions used by the web interface.) To illustrate the problem, let’s assume that there are two tools (T1 and T2) that you have set up to connect to your Salesforce.com organization. Naturally, you would configure them both to use your username and password. Let’s suppose that T2′s job takes longer than T1′s job. After running both tools for a while, T1 will complete its task. It is a commonly accepted programming best practice to release resources (mutexes, database connections, file handles, etc.) once they are no longer needed to prevent resource starvation, so it would be reasonable to assume that the author of T1 would see the logout() call described in the API documentation and assume that it was a polite and proper thing to do to call logout() on the API connection once it was no longer needed. However, since T1 and T2 are both connecting as the same user, when T1 called logout() it invalidated the session that was also in use by T2. This means that T2′s next API call will fail with an exception code of INVALID_SESSION_ID. This means that logout() is a quite dangerous call: it will disrupt any other connections made using the same username. So, unless you have a specific reason to kill all active API connections for the user, including connections made by other tools, do not call logout().

Continue with Part 2 of this series.

  • Digg
  • StumbleUpon
  • del.icio.us
  • Facebook
  • Twitter
  • Google Bookmarks
  • DZone
  • HackerNews
  • LinkedIn
  • Reddit
  • http://developer.force.com Dave Carroll

    Hey Marshall,

    Nice thoughtful post and thanks for the compliments about our API! A couple of thoughts I’d like to share.

    SOAP ain’t simple. But, with all the tooling, including the generator that you mentioned, the developer is fairly well insulated from the details and intricacies of using SOAP. One of the main reasons why we bother at all with an Enterprise WSDL is because of the tooling support. The other is that before metadata APIs where implemented the enterprise WSDL was a decent way of documenting your data schema. Now, I might have a bit of bias here as to the ease of using SOAP as the integration technology, but that’s because I was building integrations with salesforce.com using the old XML-RPC API before we came out with the SOAP version. I would expect to see more evolution of the API towards some RESTish interfaces in the future.

    The login fault issue is an interesting one. Different SOAP stacks handle WSDL definitions in different ways. I use the Apache Axis stack that is included in the Eclipse distro. The responses that I get are different from yours. When my client code is generated there is an ApiFault that extends AxisFault, and a LoginFault that extends ApiFault. This is what should be generated from the WSDL definition. Indeed, every other fault type extends the ApiFault class. There really is no need to differentiate between a login that was faulty due to username versus one that was faulty due to password. This is a common practice to prevent hinting at where the credentials might be at fault to discourage attacks. Having said that, there is a bit of inconsistency between the two login fault cases. Both return an exceptionCode of INVALID_LOGIN, but when the username is not found the exceptionMessage returned is “Invalid username or password or locked out’, where as when the password is not correct the exceptionMessage returned is “INVALID_LOGIN: Invalid username, password, security token; or user locked out.”.

    I love that message “Query too complicated”. It has the ring of “you sent a query that is valid, it just too hard for me to grok.”. Joking aside, you suggest the right course of action in splitting the fields across retrieve calls. Now, if there were a way to reliably predict a query that might result in this exception, what would be the solution? My guess is that you would do the same thing that you would when you encounter this exception. So you can’t reliably predict this situation, but you can reliably detect it.

    I also have a comment about the logout() behavior. I tested this before writing this comment and what you describe is indeed true. One of the first best practices that any API based integration needs to adhere to is the ability to detect and handle a session timeout or other invalid session state. This means that you can never be assured, from one API call to the next that you still have a valid session due to the user configurable session timeout settings. A robust and reliable integration will have a handler in every API invocation that can detect and invalid session, relogin and then repeat the invocation with new session. One way to think of it is that there are ways in which a session can become invalid, and one of those is via the logout() call of a concurrently running integration. Other ways are the session can timeout or the session can be invalidated by an Admin user.

    I look forward to more gotchas that you uncover. Let us shed light on the dark corners so that we may truly “grok”.

    Cheers!

    PS: Definition of grok from the Jargon File
    When you claim to ‘grok’ some knowledge or technique, you are asserting that you have not merely learned it in a detached instrumental way but that it has become part of you, part of your identity. For example, to say that you “know” Lisp is simply to assert that you can code in it if necessary — but to say you “grok” LISP is to claim that you have deeply entered the world-view and spirit of the language, with the implication that it has transformed your view of programming. Contrast zen, which is a similar supernatural understanding experienced as a single brief flash.

  • Marshall Pierce

    Dave,

    Thanks for the detailed comment! You’ve got several good points I’d like to respond to.

    You’re absolutely right that the good tools available for SOAP are what make it viable and arguably easier to use than other metaprotocols, and technically the S in SOAP no longer stands for “simple” anyway… REST-style JSON or POX over HTTP has its merits, but the advantage of having a tool generate a whole package of stub classes for you automatically, checked exceptions and all, should not be underestimated. Having an organization’s entire object model turned into type-safe Java is easy to use and very “discoverable”. A developer could get pretty far with the Enterprise API using only an IDE’s method completion, and that’s a good thing. The Partner API is not quite so discoverable, alas.

    The details of how the two different messages for bad username vs bad password are beyond my expertise to be able to say for sure what exact type should be thrown. <sf:fault xsi:type="sf:LoginFault"> seems to be saying two different things: it’s a sf:fault element, which to me implies that it should be an ApiFault, but xsi:type="sf:LoginFault" seems like it should be a LoginFault object. Perhaps JAX-WS gets confused by the apparent contradiction and falls back to throwing SOAPFaultException. I’ll continue to investigate to see what the proper interpretation should be and if there’s a corresponding bug in JAX-WS.

    If metrics were available about how much each field contributed towards the QUERY_TOO_COMPLICATED 64k limit, it certainly would be possible to work around that by splitting up queries. However, it does seem like the sort of thing that could be just as easily done on the server side by breaking apart the list of fields and reassembling them after the individual queries, especially since there already is a well-documented and avoidable constraint on query size: the 10,000 character query limit. See the MALFORMED_QUERY exception code.

    As far as the logout() behavior goes, I can understand why it is implemented this way. There are two reasons I included it in the list anyway. First of all, the documentation of logout(), while technically correct, doesn’t make clear that there is only ever one session at a time for a given user, with the corresponding implication for multiple external systems that all are configured with the same username. This makes it more likely that some well-meaning developer will call logout() inappropriately. Second, while you are absolutely correct that a robust integration will handle session invalidation as you describe, the fact that an actively used session can be invalidated by someone else means that there will be more confused developers and unreliable tools than would otherwise be the case. Yes, everyone should write a layer around the generated stub code to handle an UnexpectedError with a code of INVALID_SESSION_ID and re-initialize the underlying binding, but that’s simply not going to be the case in the wild. I myself was caught off guard by this problem: after seeing the release notes for API v13, I was under the impression that as long as a connection was actively used, I would not have to deal with expired sessions (barring an Admin invalidating the session, which was not really applicable for our situation). I’m sure I’m not alone in that regard.

    There are many other dark corners we’ve found, and we look forward to exploring them with you!

    • Hubert

      Hi,
      just one comment about a misconception I had when reading about this logout/shared session problem.
      I was under the impression that there only can be one session per user no matter if the user is logged in into salesforce via a webbrowser or via the webservice.
      I checked this and this doesn’t seem to be true. For example: I log into Salesforce via user my.user@my.company.com, then I log in my webservice using the same user. Then logout via browser => webservice still works.
      So only multiple sessions for the webservice might be a problem.

      Can you confirm that it’s working this way (or has this been solved in the current API)? Thanks for the great articles and tutorials.

      • Marshall Pierce

        I’m not 100% positive, but as I recall, your description is correct: calling logout() only ends the API session, not the web UI session.

    • john

      In regards to the bizarre nature of the logout() behaviour, to me it’s just bad design that a logout() call from a completely unrelated app could log you out of an API call on another app. It’s like saying if I open multiple jdbc connections to a database with the same username then if I close one then the other automatically closes. The only reason why I see Salesforce doing this is that it basically forces you to ‘buy’ another user license if you want to avoid these issues. The salesforce best practices doc pretty much says the same thing.

  • Pingback: Creating a Lead via the Salesforce.com API | PHP and Salesforce

  • Pingback: A new Java Salesforce API Library | Team Lazer Beez Blog

  • Pingback: Salesforce.com SOAP API Gotchas Part 4 | Team Lazer Beez Blog

  • Pingback: Salesforce.com SOAP API Gotchas Part 3 | Team Lazer Beez Blog

  • Pingback: Salesforce.com API Gotchas Part 2: Custom Field Ids | Team Lazer Beez Blog