Saturday, October 30, 2010

Gorm Performance of collections - continued

In my last post - I told you about the issue Burt Beckwith brought up at the SpringOne conference with adding a member to a 1-to-many relationship. That issue is that hibernate will load all of the values from the many side to make sure the new item can be added. If this number is small, then that is fine - but if there are a lot then you have a potential issue.

Burt also talked about a solution. His suggested solution reminded me a little of the old joke:

Patient: Doctor, Doctor - it hurts when I do this
Doctor: Then don't do that

Burt's suggestion was to not model Libary to Visits as a 1-to-many relationship. Instead remove the collection altogether from the Library and give the Visit a reference back to the library.

Now the Library and the Visit look like the following:
Library:
1:    
2: class Library {
3: String name
4:
5: static constraints = {
6: }
7: }
8:


Visit:
1:  class Visit {  
2: String personName
3: Date visitDate = new Date()
4:
5: Library library
6: static constraints = {
7: }
8: }
9:


Notice now that Library has no reference at all to a Visit, but Visit still has a reference to the Library.

Because of this change, it will change our usage of these domain classes - but interestingly enough it does not change the table definition. Lets look at the tables for the original Library and the new Library:

1:  create table library (  
2: id bigint generated by default as identity (start with 1),
3: version bigint not null,
4: name varchar(255) not null,
5: primary key (id));
6:
7: create table library (
8: id bigint generated by default as identity (start with 1),
9: version bigint not null,
10: name varchar(255) not null,
11: primary key (id));
12:
13:


You can see from above that the table definition is exactly the same so there will be no impact at the database level - but there is an impact at the code level.

Now, instead of adding Visits to a library we have to pass a reference to the library to the Visit. Something like the following:
1:    Library library = new Library(name:'LocalLibrary3')  
2: int maxVisits = 100
3: library.save()
4:
5: (1..maxVisits).each {
6: new Visit(personName:"person${it}", library:library).save()
7: }
8:


Notice above we create an instance of the Library, then as we create Visits we add a reference to the library that the visit is associated with. There is no dynamic method on the Library class called 'addToVisits' - because there are no visits.

Lets look at the entire integration test. Note that I changed the name of the Library class to Library3 and Visit3 so they did not conflict with the original classes.

1:   void testAddingLibrary2Visits3() {  
2: Library3 library3 = new Library3(name:'LocalLibrary3')
3: int maxVisits = 100
4: library3.save()
5:
6: (1..maxVisits).each {
7: new Visit3(personName:"person${it}", library:library3).save()
8: }
9:
10: sessionFactory.getCurrentSession().flush()
11: sessionFactory.getCurrentSession().clear()
12:
13: assertEquals 1, Library3.count()
14: assertEquals maxVisits, Visit3.count()
15:
16: sessionFactory.getCurrentSession().flush()
17: sessionFactory.getCurrentSession().clear()
18:
19: println "============= Find the library ============="
20:
21: Library3 library33 = Library3.findById(library3.id)
22:
23: println "============= adding one more visit ============="
24: Visit3 oneMoreVisit = new Visit3(personName:"oneMorePerson",library:library33)
25: oneMoreVisit.save()
26: println "============= saved new Visit one more visit ============="
27: assertEquals maxVisits+1, Visit3.countByLibrary(library33)
28:
29: sessionFactory.getCurrentSession().flush()
30: sessionFactory.getCurrentSession().clear()
31: println "============= Done adding one more visit ============="
32:
33: println "============= Check count of libraries and visits ============="
34: assertEquals 1, Library3.count()
35: assertEquals maxVisits+1, Visit3.count()
36: }
37:


Lets look at each section of the unit test and the result sql statements:
1:  INTEGRATIONTEST  
2: println "============= Find the library ============="
3: Library3 library33 = Library3.findById(library3.id)
4:
5: SQL
6: ============= Find the library =============
7: Hibernate:
8: select
9: this_.id as id9_0_,
10: this_.version as version9_0_,
11: this_.name as name9_0_
12: from
13: library3 this_
14: where
15: this_.id=?
16:

On line 3 above, we perform a find for the library instance. You can see this results in a single select, very much like the last time.

1:  INTEGRATION TEST  
2: println "============= adding one more visit ============="
3: Visit3 oneMoreVisit = new Visit3(personName:"oneMorePerson",library:library33)
4: oneMoreVisit.save()
5:
6: SQL
7: ============= adding one more visit =============
8: Hibernate:
9: insert
10: into
11: visit3
12: (id, version, library_id, person_name, visit_date)
13: values
14: (null, ?, ?, ?, ?)
15:

This is where we really see the benefit. On lines 3, we create the new Visit and pass a reference to the library. On line 4, we save the new visit. Notice the resulting sql does not make a select of all of the visits. Instead it can insert the record directly into the table. This clearly can be a huge performance gain if your collections are very large.

This is something you need to think about when designing your model. GORM can sometimes appear to be magic. I was told once that if it appears to be magic it is because you do not understand it. If you dont understand it, then subtle cases like this will cause you problems on a project.

If you know about this ahead of time, then you can implement it the correct way. But sometimes, it is not clear if you have to do this and you never want to make premature optimizations before you know you have to.

I was wondering then, what would happen if you implemented the collection in the conventional way - and then later had to change it. The usage is significantly different that it would have a ripple effect in you code. I wanted to see if I could retrofit the existing api, with the above implementation.

I got close - but I could not get it to match completely. I got very close, but I needed to add an additional flush:true in the original integration test.

Below I show my attempt to modify the Library class to accommodate the above usage. I called the new classes Library2 and Visit2.

Library2
1:  class Library2 {  
2:
3: String name
4: Boolean visitsDirty = false
5:
6: private List<Visit2> _visits = new ArrayList<Visit2>()
7:
8: Library2 addToVisits(Visit2 theVisit) {
9: if (theVisit) {
10: theVisit.library = this
11: _visits << theVisit
12: visitsDirty = true
13: }
14: return this
15: }
16:
17: List<Visit2> getVisits() {
18: List<Visit2> theVisits = Visit2.findAllByLibrary(this)
19: }
20:
21: def afterSave = {
22: Library2 lib2 = Library2.load(this.id)
23: _visits.each {
24: it.library = lib2
25: it.save()
26: }
27: _visits.clear()
28: visitsDirty = false
29: }
30: def afterInsert = {
31: Library2 lib2 = Library2.load(this.id)
32: _visits.each {
33: it.library = lib2
34: it.save()
35: }
36: _visits.clear()
37: visitsDirty = false
38: }
39: def afterUpdate = {
40: Library2 lib2 = Library2.load(this.id)
41: _visits.each {
42: it.library = lib2
43: it.save()
44: }
45: _visits.clear()
46: visitsDirty = false
47: }
48: def beforeDelete = {
49: Visit.executeUpdate("delete Visit2 v where v.library = ?", [this])
50: }
51:
52: static constraints = {
53: }
54: }
55:


as you can see I had to hook into a number of Hibernate events.

My conclusion after the exercise above was that it was better to re-write the implementation than it was to try to retrofit the api.

I hope you found this useful and will be better prepared to design and implement your collections with Grails and GORM.

Friday, October 22, 2010

SpringOne 2GX Conf - GORM Performance

I was lucky enough to attend the SpringOne 2GX conference held in the Chicago area this week. There were plenty of great sessions - and they have given me much information to blog about.

This blog post is inspired by a session by Burt Beckwith who has an awesome blog himself. Check it out!

Burt's session was called 'Advanced GORM - Performance, Customization and Monitoring'. In this session he showed an interesting 'gotcha' which I looked into further and thought others might like to be made aware of this one. I would like to say I knew this at some point - but I have to admit I had forgotten this potential issue.

Two caveats:
1) This issue is really a result of what hibernate has to do to make the domain model correct. It is just that GORM makes it easy to fall into this issue - it is not a GORM issue. It is also not really a hibernate issue either - it just is.
2) Burt stressed that what he showed was to inform, not to bias people away from a particular modeling style.

The issue is centered around Grails 'One-to-Many' relationships. Burt used Library and Visit domain objects in his example - and if it is not too egregious - I will use the same in this blog post.

Imagine we have to keep track of the number of visits to a particular library. The standard modeling technique would be a single Library has many Visits. This is very easy to model in GORM ( and that ease in a way contributes to the issue ).

This can be modeled like the the following:
1:  class Library {  
2: String name
3: static hasMany = [visits:Visit]
4:
5: }
6:


Where a Visit might look like the following:
1:  class Visit {  
2: String personName
3: Date visitDate = new Date()
4: static belongsTo = [library:Library]
5: }
6:


So far so good, or is it... what is the problem?

For the hasMany relationship a hibernate Persistent Set is created by default. A Set requires all of the elements in the Set be unique. How would hibernate know if the new item is unique - without loading all of the elements to verify that it is unique? The short answer is, it has to load all of the elements first. Yes, all visits to the library - all of them. This could number into the millions at some point. Even if as a result of business rules we know that the Visit is unique.

In hibernate, the relationship could have been modeled as a bag - but unfortunately GORM does not support a bag.

Lets look at this and see if that is true. Below is a unit test which demonstrates this.
1:    void testAddingLibraryVisits() {  
2: Library library = new Library(name:'LocalLibrary')
3: int maxVisits = 100
4:
5: (1..maxVisits).each {
6: library.addToVisits(new Visit(personName:"person${it}"))
7: }
8: library.save()
9: sessionFactory.getCurrentSession().flush()
10: sessionFactory.getCurrentSession().clear()
11:
12: assertEquals 1, Library.count()
13: assertEquals maxVisits, Visit.count()
14:
15: sessionFactory.getCurrentSession().flush()
16: sessionFactory.getCurrentSession().clear()
17:
18: println "============= Find the library ============="
19:
20: Library newLibrary = Library.findById(library.id)
21:
22: println "============= Create one more visit ============="
23: Visit oneMoreVisit = new Visit(personName:"oneMorePerson")
24: println "============= Add and Save one more visit ============="
25: newLibrary.addToVisits(oneMoreVisit).save()
26: println "============= Check number of visits in library ============="
27: assertEquals maxVisits+1, newLibrary.visits.size()
28:
29: sessionFactory.getCurrentSession().flush()
30: sessionFactory.getCurrentSession().clear()
31: println "============= Done adding one more visit ============="
32:
33: println "============= Check count of libraries and visits ============="
34: assertEquals 1, Library.count()
35: assertEquals maxVisits+1, Visit.count()
36: }
37:


Lets look at each section.

The first part, lines 1-17 just get data setup. It is the other lines that are interesting.

Lines18-20 which are summarized below with their SQL output.

1:  println "============= Find the library ============="  
2: Library newLibrary = Library.findById(library.id)
3:
4: ============= Find the library =============
5: Hibernate:
6: select
7: this_.id as id8_0_,
8: this_.version as version8_0_,
9: this_.name as name8_0_
10: from
11: library this_
12: where
13: this_.id=?
14:
15:

After saving a 100 visits to the library, we get setup to add one more visit. In line 2 above, we perform a 'findById' on the library. You can see from the resulting query that the visits have not been loaded because by default all collections are lazy.

Next we add a newly created visit to the collection.
1:  println "============= Create one more visit ============="  
2: Visit oneMoreVisit = new Visit(personName:"oneMorePerson")
3: println "============= Add and Save one more visit ============="
4: newLibrary.addToVisits(oneMoreVisit).save()
5:
6: ============= Create one more visit =============
7: ============= Add and Save one more visit =============
8: Hibernate:
9: select
10: visits0_.library_id as library3_1_,
11: visits0_.id as id1_,
12: visits0_.id as id2_0_,
13: visits0_.version as version2_0_,
14: visits0_.library_id as library3_2_0_,
15: visits0_.person_name as person4_2_0_,
16: visits0_.visit_date as visit5_2_0_
17: from
18: visit visits0_
19: where
20: visits0_.library_id=?
21: Hibernate:
22: insert
23: into
24: visit
25: (id, version, library_id, person_name, visit_date)
26: values
27: (null, ?, ?, ?, ?)
28: Hibernate:
29: call identity()
30: Hibernate:
31: update
32: library
33: set
34: version=?,
35: name=?
36: where
37: id=?
38: and version=?
39: ============= Done adding one more visit =============
40:



In line 2 we create a new Visit instance, and in line4 we add this to the visits collection and save the library.

Notice starting in line 8 - that we are loading all of the visits for the particular library. This is where the issue comes into play.
On line 21 the new visit record is being inserted into the visit table and finally on line 31 the Library instance is saved.

I hope you can see that this could definitely be a performance issue. It will work great in dev, and great in QA - but 1 month into production and this will be an issue.

This concludes the database interaction to save one more item to a collection.

In my next blog post I will discuss what Burt suggested to address this problem, and a set of API changes to implement Burt's suggestion while keeping the unit test intact.

Thanks for stopping by and stop by again to see the next blog post.

Wednesday, October 13, 2010

Using the H2 Database CSVREAD with a sequence ID Column

The problem I was researching was the following:
Make the contents of a csv file, a potentially very large csv file, available to a Grails web application and be able to query against the fields as though it were a database.

I had a need to read a csv file and put this into a database so I could later query against it and the database had to persist across web application restarts. My research ( which amounted to a bunch of google searches ) led me to the database H2. You can find out more about H2 here.

More research led me to a pretty cool google code project called, gcsvsql which was a project that contained a groovy script to use H2, read the file(s) and put the data into an in memory database to query against on the command line or in a groovy shell.

This was almost exactly what I needed. I just needed to get this into a file based database because the data had to persist across restarts of the web application, and I really wanted an ID column. The data had a business ID column, that was probably unique - but since I did not control it I really wanted a surrogate ID column. Plus I originally thought I would use GORM and having an ID column just makes GORM life easier.

What I ended up doing was to use much of the gcsvsql script, added some additional parameters for output database directory, table name to use, etc. The important 5 lines are below.

1:  def sql = Sql.newInstance("jdbc:h2:file:${outputDir}/${tableName}","org.h2.Driver")  
2:
3: def seqstmt = "create sequence " + tableNameSeq + " start with 1"
4:
5: def stmt = "create table $tableName(ID INT PRIMARY KEY, $columnTypes) as select (select " + tableNameSeq + ".nextval from dual), * from csvread('$pathToCsvFile')" as String
6:
7: sql.execute(seqstmt)
8:
9: sql.execute(stmt)
10:


In Line 1, I am just using Groovy Sql to establish a connection a database based on some input parameters.
in Line 3, I am create a sequence that I am going to use to hold the primary key ID column values
In Line 5, I am creating a new table in the database, from the input parameters using the sequence to generate the primary key ID field.

Lines 7 and 9 just get the ball rolling after the setup.

Why this blog posting? Well - I never actually found a blog with this exact example of reading a csvfile into a file based H2 database and adding a new ID column based on a dynamically created sequence so I wanted to share back. As I mentioned, I borrowed heavily from the gcsvsql script, so you should definitely check that out.

If you want to see the entire script - leave me a comment and I will add the file to a future blog post along with my Groovy Sql reading of the data.

I hope this was useful to someone out there.