Saturday, October 30, 2010

Gorm Performance of collections - continued

In my last post - I told you about the issue Burt Beckwith brought up at the SpringOne conference with adding a member to a 1-to-many relationship. That issue is that hibernate will load all of the values from the many side to make sure the new item can be added. If this number is small, then that is fine - but if there are a lot then you have a potential issue.

Burt also talked about a solution. His suggested solution reminded me a little of the old joke:

Patient: Doctor, Doctor - it hurts when I do this
Doctor: Then don't do that

Burt's suggestion was to not model Libary to Visits as a 1-to-many relationship. Instead remove the collection altogether from the Library and give the Visit a reference back to the library.

Now the Library and the Visit look like the following:
Library:
1:    
2: class Library {
3: String name
4:
5: static constraints = {
6: }
7: }
8:


Visit:
1:  class Visit {  
2: String personName
3: Date visitDate = new Date()
4:
5: Library library
6: static constraints = {
7: }
8: }
9:


Notice now that Library has no reference at all to a Visit, but Visit still has a reference to the Library.

Because of this change, it will change our usage of these domain classes - but interestingly enough it does not change the table definition. Lets look at the tables for the original Library and the new Library:

1:  create table library (  
2: id bigint generated by default as identity (start with 1),
3: version bigint not null,
4: name varchar(255) not null,
5: primary key (id));
6:
7: create table library (
8: id bigint generated by default as identity (start with 1),
9: version bigint not null,
10: name varchar(255) not null,
11: primary key (id));
12:
13:


You can see from above that the table definition is exactly the same so there will be no impact at the database level - but there is an impact at the code level.

Now, instead of adding Visits to a library we have to pass a reference to the library to the Visit. Something like the following:
1:    Library library = new Library(name:'LocalLibrary3')  
2: int maxVisits = 100
3: library.save()
4:
5: (1..maxVisits).each {
6: new Visit(personName:"person${it}", library:library).save()
7: }
8:


Notice above we create an instance of the Library, then as we create Visits we add a reference to the library that the visit is associated with. There is no dynamic method on the Library class called 'addToVisits' - because there are no visits.

Lets look at the entire integration test. Note that I changed the name of the Library class to Library3 and Visit3 so they did not conflict with the original classes.

1:   void testAddingLibrary2Visits3() {  
2: Library3 library3 = new Library3(name:'LocalLibrary3')
3: int maxVisits = 100
4: library3.save()
5:
6: (1..maxVisits).each {
7: new Visit3(personName:"person${it}", library:library3).save()
8: }
9:
10: sessionFactory.getCurrentSession().flush()
11: sessionFactory.getCurrentSession().clear()
12:
13: assertEquals 1, Library3.count()
14: assertEquals maxVisits, Visit3.count()
15:
16: sessionFactory.getCurrentSession().flush()
17: sessionFactory.getCurrentSession().clear()
18:
19: println "============= Find the library ============="
20:
21: Library3 library33 = Library3.findById(library3.id)
22:
23: println "============= adding one more visit ============="
24: Visit3 oneMoreVisit = new Visit3(personName:"oneMorePerson",library:library33)
25: oneMoreVisit.save()
26: println "============= saved new Visit one more visit ============="
27: assertEquals maxVisits+1, Visit3.countByLibrary(library33)
28:
29: sessionFactory.getCurrentSession().flush()
30: sessionFactory.getCurrentSession().clear()
31: println "============= Done adding one more visit ============="
32:
33: println "============= Check count of libraries and visits ============="
34: assertEquals 1, Library3.count()
35: assertEquals maxVisits+1, Visit3.count()
36: }
37:


Lets look at each section of the unit test and the result sql statements:
1:  INTEGRATIONTEST  
2: println "============= Find the library ============="
3: Library3 library33 = Library3.findById(library3.id)
4:
5: SQL
6: ============= Find the library =============
7: Hibernate:
8: select
9: this_.id as id9_0_,
10: this_.version as version9_0_,
11: this_.name as name9_0_
12: from
13: library3 this_
14: where
15: this_.id=?
16:

On line 3 above, we perform a find for the library instance. You can see this results in a single select, very much like the last time.

1:  INTEGRATION TEST  
2: println "============= adding one more visit ============="
3: Visit3 oneMoreVisit = new Visit3(personName:"oneMorePerson",library:library33)
4: oneMoreVisit.save()
5:
6: SQL
7: ============= adding one more visit =============
8: Hibernate:
9: insert
10: into
11: visit3
12: (id, version, library_id, person_name, visit_date)
13: values
14: (null, ?, ?, ?, ?)
15:

This is where we really see the benefit. On lines 3, we create the new Visit and pass a reference to the library. On line 4, we save the new visit. Notice the resulting sql does not make a select of all of the visits. Instead it can insert the record directly into the table. This clearly can be a huge performance gain if your collections are very large.

This is something you need to think about when designing your model. GORM can sometimes appear to be magic. I was told once that if it appears to be magic it is because you do not understand it. If you dont understand it, then subtle cases like this will cause you problems on a project.

If you know about this ahead of time, then you can implement it the correct way. But sometimes, it is not clear if you have to do this and you never want to make premature optimizations before you know you have to.

I was wondering then, what would happen if you implemented the collection in the conventional way - and then later had to change it. The usage is significantly different that it would have a ripple effect in you code. I wanted to see if I could retrofit the existing api, with the above implementation.

I got close - but I could not get it to match completely. I got very close, but I needed to add an additional flush:true in the original integration test.

Below I show my attempt to modify the Library class to accommodate the above usage. I called the new classes Library2 and Visit2.

Library2
1:  class Library2 {  
2:
3: String name
4: Boolean visitsDirty = false
5:
6: private List<Visit2> _visits = new ArrayList<Visit2>()
7:
8: Library2 addToVisits(Visit2 theVisit) {
9: if (theVisit) {
10: theVisit.library = this
11: _visits << theVisit
12: visitsDirty = true
13: }
14: return this
15: }
16:
17: List<Visit2> getVisits() {
18: List<Visit2> theVisits = Visit2.findAllByLibrary(this)
19: }
20:
21: def afterSave = {
22: Library2 lib2 = Library2.load(this.id)
23: _visits.each {
24: it.library = lib2
25: it.save()
26: }
27: _visits.clear()
28: visitsDirty = false
29: }
30: def afterInsert = {
31: Library2 lib2 = Library2.load(this.id)
32: _visits.each {
33: it.library = lib2
34: it.save()
35: }
36: _visits.clear()
37: visitsDirty = false
38: }
39: def afterUpdate = {
40: Library2 lib2 = Library2.load(this.id)
41: _visits.each {
42: it.library = lib2
43: it.save()
44: }
45: _visits.clear()
46: visitsDirty = false
47: }
48: def beforeDelete = {
49: Visit.executeUpdate("delete Visit2 v where v.library = ?", [this])
50: }
51:
52: static constraints = {
53: }
54: }
55:


as you can see I had to hook into a number of Hibernate events.

My conclusion after the exercise above was that it was better to re-write the implementation than it was to try to retrofit the api.

I hope you found this useful and will be better prepared to design and implement your collections with Grails and GORM.

No comments: