One of the many excellent Salesforce.com platform features is allowing one to define typed custom fields on objects. This is a widely used feature of Salesforce.com and it is not uncommon for an organization to define hundreds of custom fields. Using the Salesforce.com SOAP API, one can retrieve the defined custom fields for an object and can sync custom field data using the same procedure as syncing standard field data. Normally this works beautifully. Unfortunately, problems result when custom field names are changed.
Unique Ids for Object Records in Salesforce.com
The problem stems from an inconsistency in the way Salesforce.com treats custom fields compared to other records. To understand the inconsistency, we must first understand how Salesforce.com uniquely identifies all records. In Salesforce.com, every record has a 15-character Id (the 18-character version will be covered in a future post). The first three characters of the Id identify the object type of the record. As examples, a
003 prefix means the record is a Contact, while a
005 prefix means the record is a User, and a
00D prefix means the record is an Organization.
Custom fields, however, are treated differently. While custom fields have 15 character Ids (prefixed by
00N) in Salesforce.com’s database just like object records, this Id is not exposed via the SOAP API. Instead, Salesforce.com provides a unique, but mutable, “API name” that is initially auto-generated from the field name.
Salesforce.com Records and Metadata
The “API name” is one of the many field attributes retrievable using the
describeSObjects() API call. Results are returned for all object types in the form of a list of
describeSObjectResult objects. (Incidentally, you can get the record Id prefix for an object type by examining the keyPrefix field in a
There is a plethora of information in
describeSObjectResult, but for now the data we care about is the list of
Field objects. Unsurprisingly, there will be one
Field object for each field (standard or custom) that is defined for an object type.
A Field has both a “label” (a user-friendly display name) and a “name” (computer-friendly identifier). As an example:
For contact standard field "Annual Revenue": label = Annual Revenue name = AnnualRevenue
Note that the name is usually just the label with spaces and other non-word characters stripped out.
Custom Fields and the API
Custom fields are represented slightly differently. The name is not actually the “Field Name” shown in the Salesforce.com UI (not exposed via the API), but is instead the aforementioned “API Name”. The “API Name” is closely related to the “Field Name” as it is the field name with a special
__c suffix (two underscores before the c).
For instance, if you have a custom field to hold a contact’s shoe size, you might call the field “Shoe Size”. Upon field creation, Salesforce.com will set the “Field Name” to Shoe_Size and the “API Name” to Shoe_Size__c. If you then “delete” the field, the name will change to Shoe_Size_del__c. The field will then be placed in your deleted fields list where you can “undelete” the field. Upon undeletion, the name remains Shoe_Size_del__c. You can then delete the field again, which will change its name to Shoe_Size_del_del__c.
Genius.com’s integration with Salesforce Metadata
When a customer integrates Salesforce.com with Genius.com, we download all available records for that customer from several different object types (Contact, Lead, etc). As part of this, it is important that we represent all of the custom fields on those objects. To keep our data consistent with Salesforce.com’s, we need to track all changes that the customer makes to their custom fields. To be specific, I’m referring to changes to the fields themselves, not just changes to specific record field values. Let’s work through some scenarios of how the custom fields for an object can change. (For clarity, let’s just focus on the Contact object.) You may note that some of the behavior we’re giving our hypothetical customer is not necessarily common usage, but it’s all perfectly valid usage of the Salesforce.com custom field model.
First, suppose that Contact has the Shoe_Size__c integer custom field I used earlier. If the customer adds a date custom field LastBoughtNewShoes__c, that change is easy to handle. When we get the fields of the Contact object, we will see Shoe_Size__c and LastBoughtNewShoes__c. We can detect that we already have a field called Shoe_Size__c, so we skip that one. We perform the same check for LastBoughtNewShoes__c and see that we do not yet have that field, so we create that field on Contact.
Problems with renaming a custom field
Now, suppose that the customer decides he doesn’t like the name LastBoughtNewShoes__c and changes it to LastPurchasedNewShoes__c. When we next check the Contact metadata, it will look like LastBoughtNewShoes__c was deleted and LastPurchasedNewShoes__c was added, thus causing us to have to delete all of the values for LastBoughtNewShoes__c and re-download them for LastPurchasedNewShoes__c, even though it’s exactly the same data.
Problems with deleting and undeleting a custom field
The customer then accidentally deletes the shoe size custom field. Realizing his mistake, he then undeletes the custom field. His data is still safe, but the field name will have changed from ShoeSize__c to ShoeSize_del__c. Thus, like when he purposely changed the name of the field, we will delete the old field (and all its values) and create the new field (and have to re-download all the values).
Problems with custom field names and data types
Next, the customer decides he doesn’t like storing the date that his contacts last bought shoes and would instead like to store an estimation of how frequently his contacts buy shoes. So, he deletes LastPurchasedNewShoes__c, which changes its name to LastPurchasedNewShoes_del__c, and creates a new LastPurchasedNewShoes__c. This time, though, instead of creating it as a date field, he creates it as a picklist (aka dropdown) field with values like “1st Quarter”, “2nd Quarter”, “3rd Quarter”, and “4th Quarter”. When we check the metadata for Contact, we will see a LastPurchasedNewShoes__c field that we already have. However, it’s obviously not going to work to try to write “1st Quarter” into a date field. Depending on the language and DBMS used, that might cause an exception to be thrown, a date of 1970-01-01 or 0000-00-00 to be written, or any of a number of other possible failure conditions. It is possible to detect this specific scenario, though, by examining the type of the field. We could see that the type of LastPurchasedNewShoes__c had changed since the last time we checked metadata and use that to delete the old date field and create a new picklist field. There are a few other fields that can provide hints that a field may have changed in a way that requires deleting the old field and recreating it like “calculated”, “nillable” and a few others, but this is not always accurate, as the next example shows.
Problems with deleting a custom field and creating a new field
Next, the customer decides he doesn’t like storing shoe size as the standard US shoe size (11, 12, etc) and would prefer to store the interior volume of the optimal shoe for the contact in cubic centimeters. To simplify matters, assume that the customer did not accidentally delete the shoe size field, so it is still called Shoe_Size__c. This new measurement cannot be calculated from the shoe size, so all new data is going to need to be gathered from contacts for this field. Consequently, the customer deletes the initial Shoe_Size__c field and replaces it with another int field, also called Shoe_Size__c. When we next check metadata, everything will match: the old field and the new field have the same name, type, and other attributes, even though they are semantically different. The correct behavior would be to delete the old field and all its values and create a new field of the same name, but there is no way to figure out what has actually happened from the data available through the API.
The naive solution
On the surface, the problems listed above are minor inconveniences that can be overcome by being conservative when custom field changes are detected. In this conservative approach, we treat all changes as the previously synced field being deleted and a new field being created. This results in all data for the previous field being deleted and data for the new field being inserted. It may seem like this just causes data syncs to take a little longer. However, there are much more important (and unavoidable) effects.
Why the problems can’t be solved with the current API
Genius.com depends on custom fields for more than just data consistency. There are several important parts of our application that depend on custom fields. Our SmartGroups implementation and our marketing automation product allow users to dynamically segment their prospect base based on custom field values.
Back to renamed field example: let us imagine a marketing user wants to automatically send an email to a prospect when it has been 6 months since they last purchased a shoe. The marketer creates an automation workflow with a rule that says
If LastBoughtNewShoes__c is greater than six months send "Email 1" and then goes back to browsing awkward family photos. Imagine that a week later the administrator for the marketer’s Salesforce.com organization renames the field in Salesforce. Under the naive solution we delete the custom field and all data. This requires us to invalidate the workflow, since the custom field it depends on no longer exists. Ideally, we would be able to just rename the field and thus leave the workflow untouched. Unfortunately, without an immutable primary key (i.e. the
00N prefixed Id) this is not possible.
The fundamental problem arises because the field name, while technically a primary key, is not a good primary key because it can change. Having a mutable primary key isn’t bad in theory. It can work in a self-contained system where all data can be kept consistent, but it is quite inconvenient in practice. There are a handful of reasons for this (making backed up data harder to correlate with current data, inefficiencies in propagating primary key changes, etc), but the most important one, in this situation, is that Salesforce.com isn’t a self-contained system. As an external system, we have no way of being notified of changes to primary keys.
It’s not just a problem for Genius.com
Interestingly enough, we aren’t the only ones to have noticed this difficulty with custom fields. Salesforce.com also provides functionality to create a web-to-lead form that you can place on your website and which will automatically create a lead in your Salesforce.com organization when someone fills it out. It is useful to include custom field form fields in web-to-lead forms and, naturally, Salesforce.com lets you do this. Here’s some example output of the Salesforce.com web-to-lead form generator.
<input type="text" name="first_name" id="first_name">
<input type="hidden" name="00N600000012345" id="00N600000012345"
value="Collateral (Whitepaper/Booklet)" /><!-- Lead Type -->
<input type="hidden" name="00N6000000ABCDE" id="00N6000000ABCDE"
value="Whitepaper: Sales 2.0 - Faster Sales in a Slower Economy" ><!-- Event Name -->
<input type="hidden" name="00N6000000A1B2C" id="00N6000000A1B2C"
value="" /><!--Search Terms-->
<input type="hidden" name="00N6000000ZXWQD" id="00N6000000ZXWQD"
value="" /><!--Search Engine-->
<input type="hidden" name="00N600000000001" id="00N600000000001"
value="" /><!--Media Source-->
The Salesforce.com application also uses the
00N Id when working with custom fields. For instance, an example of the URL for editing a custom field is:
As mentioned before,
00N is the Id prefix for custom fields. Given this example, it’s reasonable to assume that the developers who created the web-to-lead form generator recognized the inadequacy of relying on custom field names and instead used the “real” primary key for identifying custom fields. This Id behaves like other Salesforce.com Ids: it is unique and never changes, thus making it unique against all future possible records as well. This is the Id we really need, but there is no way to get this Id through the API.
Continue with Part 3.