Not quite answering your question, but Richard quotes:
"That said, table joins to get the string back
when you only have the ID are orders of magnitude more
expensive. The only time they would be compared is when
you are checking a constraint (i.e., on INSERT or
UPDATE). You are much more likely to SELECT them and be
forced into using a table join to look up your value by
its ID."
Joins are very expensive. If you are doing a join to
find out all customers that are male is BAD. If I used a
lookup table with a relation, I would do a
"Select @genderID = GenderID from
GENDERTABLE" and then "Select
CustomerName from Customers WHERE GenderID =
@genderID". The two selects will almost always
be faster than a join.
Of course, the benefits of lookup tables for more than a
few values are nice. Richard says that he uses SELECT
DISTINCT on the column. For large tables this is almost
always BAD. There is a huge amount of extra processing
involved with a select distinct.
The key to figuring things out like this is to become
very knowledgeable (IKnowledgeable?) on how queries work
in a particular database. This means figuring out how
things work INTERNALLY. It takes a while to figure this
stuff out. That's why so few developers do it right. And
of course, it takes time that you could be spending on
other things like XML, or OOP,or Indigo, etc.
Response to Darrell: a SELECT DISTINCT on an *indexed*
column only requires traversing the B-Tree, a trivial
action that does not even require looking into the data
pages.
Thus, for an indexed column, a SELECT DISTINCT on
10,000,000 rows should be just as fast as on 1,000 with
the same distinct values. You can proove that the Index
Scan and aggregation is all that is necessary by using
the SQL Query Analyzer.
I think there are few problems with this approach.
First, what will happen if the table does not contain
the desired value? For instance, on the first record of
the table, there will be no value allowed! (This problem
is raised in the original post, but I don't think the
solution offered there (dummy records) is acceptable).
Second, from the system administrator point of view this
is gonna be a nightmare. If, for instance, he decides to
add a value to a lookup table (say, supported operating
systems) - how is he going to do that in this
configuration?
What we do generally is this: As a rule of thumb, every
lookup table is cached (remember we are dealing with
.NET here). each time a record is retrieved from the
database which contains a value (int) that should be
translated (to string, usually), the translation is
performed in the closest-to-the-presentation-layer
location. For example, we have a special DropDownList
server control, which accepts the name of the cached
lookup, and display the right items from it.
The reason for this is that the string value is actually
not required by the application logic, but only by the
end user, which prefer (for some unknown reason) to see
"Male" rather than
"1". So why grab this unnecessary item
all the way from the database in the first place?