Friday, October 22, 2010

SpringOne 2GX Conf - GORM Performance

I was lucky enough to attend the SpringOne 2GX conference held in the Chicago area this week. There were plenty of great sessions - and they have given me much information to blog about.

This blog post is inspired by a session by Burt Beckwith who has an awesome blog himself. Check it out!

Burt's session was called 'Advanced GORM - Performance, Customization and Monitoring'. In this session he showed an interesting 'gotcha' which I looked into further and thought others might like to be made aware of this one. I would like to say I knew this at some point - but I have to admit I had forgotten this potential issue.

Two caveats:
1) This issue is really a result of what hibernate has to do to make the domain model correct. It is just that GORM makes it easy to fall into this issue - it is not a GORM issue. It is also not really a hibernate issue either - it just is.
2) Burt stressed that what he showed was to inform, not to bias people away from a particular modeling style.

The issue is centered around Grails 'One-to-Many' relationships. Burt used Library and Visit domain objects in his example - and if it is not too egregious - I will use the same in this blog post.

Imagine we have to keep track of the number of visits to a particular library. The standard modeling technique would be a single Library has many Visits. This is very easy to model in GORM ( and that ease in a way contributes to the issue ).

This can be modeled like the the following:
1:  class Library {  
2: String name
3: static hasMany = [visits:Visit]
4:
5: }
6:


Where a Visit might look like the following:
1:  class Visit {  
2: String personName
3: Date visitDate = new Date()
4: static belongsTo = [library:Library]
5: }
6:


So far so good, or is it... what is the problem?

For the hasMany relationship a hibernate Persistent Set is created by default. A Set requires all of the elements in the Set be unique. How would hibernate know if the new item is unique - without loading all of the elements to verify that it is unique? The short answer is, it has to load all of the elements first. Yes, all visits to the library - all of them. This could number into the millions at some point. Even if as a result of business rules we know that the Visit is unique.

In hibernate, the relationship could have been modeled as a bag - but unfortunately GORM does not support a bag.

Lets look at this and see if that is true. Below is a unit test which demonstrates this.
1:    void testAddingLibraryVisits() {  
2: Library library = new Library(name:'LocalLibrary')
3: int maxVisits = 100
4:
5: (1..maxVisits).each {
6: library.addToVisits(new Visit(personName:"person${it}"))
7: }
8: library.save()
9: sessionFactory.getCurrentSession().flush()
10: sessionFactory.getCurrentSession().clear()
11:
12: assertEquals 1, Library.count()
13: assertEquals maxVisits, Visit.count()
14:
15: sessionFactory.getCurrentSession().flush()
16: sessionFactory.getCurrentSession().clear()
17:
18: println "============= Find the library ============="
19:
20: Library newLibrary = Library.findById(library.id)
21:
22: println "============= Create one more visit ============="
23: Visit oneMoreVisit = new Visit(personName:"oneMorePerson")
24: println "============= Add and Save one more visit ============="
25: newLibrary.addToVisits(oneMoreVisit).save()
26: println "============= Check number of visits in library ============="
27: assertEquals maxVisits+1, newLibrary.visits.size()
28:
29: sessionFactory.getCurrentSession().flush()
30: sessionFactory.getCurrentSession().clear()
31: println "============= Done adding one more visit ============="
32:
33: println "============= Check count of libraries and visits ============="
34: assertEquals 1, Library.count()
35: assertEquals maxVisits+1, Visit.count()
36: }
37:


Lets look at each section.

The first part, lines 1-17 just get data setup. It is the other lines that are interesting.

Lines18-20 which are summarized below with their SQL output.

1:  println "============= Find the library ============="  
2: Library newLibrary = Library.findById(library.id)
3:
4: ============= Find the library =============
5: Hibernate:
6: select
7: this_.id as id8_0_,
8: this_.version as version8_0_,
9: this_.name as name8_0_
10: from
11: library this_
12: where
13: this_.id=?
14:
15:

After saving a 100 visits to the library, we get setup to add one more visit. In line 2 above, we perform a 'findById' on the library. You can see from the resulting query that the visits have not been loaded because by default all collections are lazy.

Next we add a newly created visit to the collection.
1:  println "============= Create one more visit ============="  
2: Visit oneMoreVisit = new Visit(personName:"oneMorePerson")
3: println "============= Add and Save one more visit ============="
4: newLibrary.addToVisits(oneMoreVisit).save()
5:
6: ============= Create one more visit =============
7: ============= Add and Save one more visit =============
8: Hibernate:
9: select
10: visits0_.library_id as library3_1_,
11: visits0_.id as id1_,
12: visits0_.id as id2_0_,
13: visits0_.version as version2_0_,
14: visits0_.library_id as library3_2_0_,
15: visits0_.person_name as person4_2_0_,
16: visits0_.visit_date as visit5_2_0_
17: from
18: visit visits0_
19: where
20: visits0_.library_id=?
21: Hibernate:
22: insert
23: into
24: visit
25: (id, version, library_id, person_name, visit_date)
26: values
27: (null, ?, ?, ?, ?)
28: Hibernate:
29: call identity()
30: Hibernate:
31: update
32: library
33: set
34: version=?,
35: name=?
36: where
37: id=?
38: and version=?
39: ============= Done adding one more visit =============
40:



In line 2 we create a new Visit instance, and in line4 we add this to the visits collection and save the library.

Notice starting in line 8 - that we are loading all of the visits for the particular library. This is where the issue comes into play.
On line 21 the new visit record is being inserted into the visit table and finally on line 31 the Library instance is saved.

I hope you can see that this could definitely be a performance issue. It will work great in dev, and great in QA - but 1 month into production and this will be an issue.

This concludes the database interaction to save one more item to a collection.

In my next blog post I will discuss what Burt suggested to address this problem, and a set of API changes to implement Burt's suggestion while keeping the unit test intact.

Thanks for stopping by and stop by again to see the next blog post.

No comments: