More Lookup table techniques

Sunday, February 22, 2004

My blog has moved.
You can view this post at the following address:
http://www.osherove.com/blog/2004/2/21/more-lookup-table-techniques.html

3 Comments

Not quite answering your question, but Richard quotes:

"That said, table joins to get the string back when you only have the ID are orders of magnitude more expensive. The only time they would be compared is when you are checking a constraint (i.e., on INSERT or UPDATE). You are much more likely to SELECT them and be forced into using a table join to look up your value by its ID."

Joins are very expensive. If you are doing a join to find out all customers that are male is BAD. If I used a lookup table with a relation, I would do a "Select @genderID = GenderID from GENDERTABLE" and then "Select CustomerName from Customers WHERE GenderID = @genderID". The two selects will almost always be faster than a join.

Of course, the benefits of lookup tables for more than a few values are nice. Richard says that he uses SELECT DISTINCT on the column. For large tables this is almost always BAD. There is a huge amount of extra processing involved with a select distinct.

The key to figuring things out like this is to become very knowledgeable (IKnowledgeable?) on how queries work in a particular database. This means figuring out how things work INTERNALLY. It takes a while to figure this stuff out. That's why so few developers do it right. And of course, it takes time that you could be spending on other things like XML, or OOP,or Indigo, etc.

Darrell - Sunday, February 22, 2004 4:42:00 PM

Response to Darrell: a SELECT DISTINCT on an *indexed* column only requires traversing the B-Tree, a trivial action that does not even require looking into the data pages.

Thus, for an indexed column, a SELECT DISTINCT on 10,000,000 rows should be just as fast as on 1,000 with the same distinct values. You can proove that the Index Scan and aggregation is all that is necessary by using the SQL Query Analyzer.

Richard Tallent - Monday, February 23, 2004 4:49:00 AM

I think there are few problems with this approach.

First, what will happen if the table does not contain the desired value? For instance, on the first record of the table, there will be no value allowed! (This problem is raised in the original post, but I don't think the solution offered there (dummy records) is acceptable).

Second, from the system administrator point of view this is gonna be a nightmare. If, for instance, he decides to add a value to a lookup table (say, supported operating systems) - how is he going to do that in this configuration?

What we do generally is this: As a rule of thumb, every lookup table is cached (remember we are dealing with .NET here). each time a record is retrieved from the database which contains a value (int) that should be translated (to string, usually), the translation is performed in the closest-to-the-presentation-layer location. For example, we have a special DropDownList server control, which accepts the name of the cached lookup, and display the right items from it.

The reason for this is that the string value is actually not required by the application logic, but only by the end user, which prefer (for some unknown reason) to see "Male" rather than "1". So why grab this unnecessary item all the way from the database in the first place?

Memi Lavi - Tuesday, March 9, 2004 8:10:00 PM

Comments have been disabled for this content.