Jump to Navigation

Why identity/auto incrementing column implementations can be a pain

Auto-incrementing or identity columns are often used as surrogate keys in database designs. Unfortunately there is no apparent standard for this feature so each database vendor handles the requirement in a slightly different ways. I find this confusing when moving between database vendors and a lot of time is wasted finding out how the feature is implemented in a particular database implementation.

Besides the actual implementations varying, different terminology and vocabulary is used to refer to the same concept by vendors and not all features are available across database platforms.

Sequences and identity columns

Broadly speaking there are two main vocabularies and two approaches for autoincrementing or identity columns and fields namely:

  • autoincrement/identity columns - used by MySQL, MSSQL, DB2 which is an attribute of a number type column. For MySQL the attribute is call "auto_increment" and for MSSQL and DB2 it is referred to as an "identity" column. Postgreql also accommodates auto-incrementing fields but has a unique data-type for this called "serial, just to be different,
  • sequences - used by DB2 and Postgresql which are table independent,

Auto-increment/Identity Fields

To be an identity column the field has to be a numeric datatype and have an auto-incrementing (MySQL) or identity (DB2,MSSQL) attribute set; except for postgresql where the datatype "serial" is used. All of the implementations of auto incrementing columns have some additional attributes for the identity column such as start or seed value and increment amount.In MySQL you can assign auto_increment=100 for starting values for example.

When using identity fields one does not need to specify the field and its value in insert statements even though its a mandatory field as the server will take care of getting the next value and inserting it for you. eg If tableA has two fields one being an identity column called "id" and the other being a varchar called "name".

insert into tableA (name) values ("demo insert")

is all that is required to insert a new value.

The pain with auto-incrementing columns

Copying over data while preserving identity value

Auto-incrementing keys can become a pain when you need to copy whole tabels and preserve the primary key value. Trying to insert directly into an identity column will result in a error being raised. Typically the vendor provides some statements that allow you to temporarily drop the constraint so you can insert existing values. In MSSQL you can issue the command

"SET IDENTITY_INSERT products ON".

Other vendors will require you to drop the constraint and then re-enable it.

More pain- how to retrieve value of newly inserted rows?

In addition the server usually provides different ways to retrieve the identity colunm value for a newly inserted row. For MySQL this is the LAST_INSERT_ID() function and for MSSQL it is @@identity eg select @@identity.

Sequences

Whilst identity/auto incrementing columns are tied to the life cycle of the table on which they are created, sequences are not.A sequence can continue to exist once the table that used it has been dropped. Sequences can have many more attributes as well for example the can cycle, set max and min values etc. One sequence may be used by more than one table.

Sequences are particularly useful when the table design has a surrogate key that is not auto-incrementing but numeric. i.e you have inherited a design where the designer expects you or an application to generate the unique number. One can then create a sequence and use it in insert statements.

Here again implementation diverge just enough to be annoying. In both DB2 and postgresql sequences are created using the statement

create sequence <name>;

There are various options like the amount to increment the sequence by and what number to start with. Where the implementation differ is in how to access the sequence in statements. In postgresql one can access the sequence by the nextval function:

select nextval(<sequnce name)

insert into table a (id,name) values(nextval(<sequence>),'demo')

In DB2 I have found there are two ways to access a sequence. Besides getting the next value you can also access the previous value with the function "preval"

insert into table (field1) values(nextval for tmpsequence)

insert into table (field1) values(next value for tmpsequence)

Conclusion

So if you are confused about identity columns and sequences don't feel alone. Lets hope that over time more commonality emerges around this feature. I have not had time to research the ANSI SQL standard so don't know if this has been dealt with by them. I certainly hope it has and that vendors will begin to implement it. It will save a lot of developer time.

Open Source: 


by Dr. Radut.