We've already spent quite some time talking about collections. In this section we will highlight a couple more issues about how collections behave at runtime.
Hibernate defines three basic kinds of collections:
collections of values
one to many associations
many to many associations
This classification distinguishes the various table and foreign key relationships but does not tell us quite everything we need to know about the relational model. To fully understand the relational structure and performance characteristics, we must also consider the structure of the primary key that is used by Hibernate to update or delete collection rows. This suggests the following classification:
indexed collections
sets
bags
All indexed collections (maps, lists, arrays) have a primary key consisting
of the <key> and <index>
columns. In this case collection updates are usually extremely efficient -
the primary key may be efficiently indexed and a particular row may be efficiently
located when Hibernate tries to update or delete it.
Sets have a primary key consisting of <key> and element
columns. This may be less efficient for some types of collection element, particularly
composite elements or large text or binary fields; the database may not be able to index
a complex primary key as efficiently. On the other hand, for one to many or many to many
associations, particularly in the case of synthetic identifiers, it is likely to be just
as efficient. (Side-note: if you want SchemaExport to actually create
the primary key of a <set> for you, you must declare all columns
as not-null="true".)
<idbag> mappings define a surrogate key, so they are
always very efficient to update. In fact, they are the best case.
Bags are the worst case. Since a bag permits duplicate element values and has no
index column, no primary key may be defined. Hibernate has no way of distinguishing
between duplicate rows. Hibernate resolves this problem by completely removing
(in a single DELETE) and recreating the collection whenever it
changes. This might be very inefficient.
Note that for a one-to-many association, the "primary key" may not be the physical primary key of the database table - but even in this case, the above classification is still useful. (It still reflects how Hibernate "locates" individual rows of the collection.)
From the discussion above, it should be clear that indexed collections and (usually) sets allow the most efficient operation in terms of adding, removing and updating elements.
There is, arguably, one more advantage that indexed collections have over sets for
many to many associations or collections of values. Because of the structure of a
Set, Hibernate doesn't ever UPDATE a row when
an element is "changed". Changes to a Set always work via
INSERT and DELETE (of individual rows). Once
again, this consideration does not apply to one to many associations.
After observing that arrays cannot be lazy, we would conclude that lists, maps and idbags are the most performant (non-inverse) collection types, with sets not far behind. Sets are expected to be the most common kind of collection in Hibernate applications. This is because the "set" semantics are most natural in the relational model.
However, in well-designed Hibernate domain models, we usually see that most collections
are in fact one-to-many associations with inverse="true". For these
associations, the update is handled by the many-to-one end of the association, and so
considerations of collection update performance simply do not apply.
Just before you ditch bags forever, there is a particular case in which bags (and also lists)
are much more performant than sets. For a collection with inverse="true"
(the standard bidirectional one-to-many relationship idiom, for example) we can add elements
to a bag or list without needing to initialize (fetch) the bag elements! This is because
Collection.add() or Collection.addAll() must always
return true for a bag or List (unlike a Set). This can
make the following common code much faster.
Parent p = (Parent) sess.load(Parent.class, id); Child c = new Child(); c.setParent(p); p.getChildren().add(c); //no need to fetch the collection! sess.flush();
Occasionally, deleting collection elements one by one can be extremely inefficient. Hibernate
isn't completely stupid, so it knows not to do that in the case of an newly-empty collection
(if you called list.clear(), for example). In this case, Hibernate will
issue a single DELETE and we are done!
Suppose we add a single element to a collection of size twenty and then remove two elements.
Hibernate will issue one INSERT statement and two DELETE
statements (unless the collection is a bag). This is certainly desirable.
However, suppose that we remove eighteen elements, leaving two and then add thee new elements. There are two possible ways to proceed
delete eighteen rows one by one and then insert three rows
remove the whole collection (in one SQL DELETE) and insert
all five current elements (one by one)
Hibernate isn't smart enough to know that the second option is probably quicker in this case. (And it would probably be undesirable for Hibernate to be that smart; such behaviour might confuse database triggers, etc.)
Fortunately, you can force this behaviour (ie. the second strategy) at any time by discarding (ie. dereferencing) the original collection and returning a newly instantiated collection with all the current elements. This can be very useful and powerful from time to time.
Of course, one-shot-delete does not apply to collections mapped inverse="true".