Wednesday, October 13, 2010

Using the H2 Database CSVREAD with a sequence ID Column

The problem I was researching was the following:
Make the contents of a csv file, a potentially very large csv file, available to a Grails web application and be able to query against the fields as though it were a database.

I had a need to read a csv file and put this into a database so I could later query against it and the database had to persist across web application restarts. My research ( which amounted to a bunch of google searches ) led me to the database H2. You can find out more about H2 here.

More research led me to a pretty cool google code project called, gcsvsql which was a project that contained a groovy script to use H2, read the file(s) and put the data into an in memory database to query against on the command line or in a groovy shell.

This was almost exactly what I needed. I just needed to get this into a file based database because the data had to persist across restarts of the web application, and I really wanted an ID column. The data had a business ID column, that was probably unique - but since I did not control it I really wanted a surrogate ID column. Plus I originally thought I would use GORM and having an ID column just makes GORM life easier.

What I ended up doing was to use much of the gcsvsql script, added some additional parameters for output database directory, table name to use, etc. The important 5 lines are below.

1:  def sql = Sql.newInstance("jdbc:h2:file:${outputDir}/${tableName}","org.h2.Driver")  
2:
3: def seqstmt = "create sequence " + tableNameSeq + " start with 1"
4:
5: def stmt = "create table $tableName(ID INT PRIMARY KEY, $columnTypes) as select (select " + tableNameSeq + ".nextval from dual), * from csvread('$pathToCsvFile')" as String
6:
7: sql.execute(seqstmt)
8:
9: sql.execute(stmt)
10:


In Line 1, I am just using Groovy Sql to establish a connection a database based on some input parameters.
in Line 3, I am create a sequence that I am going to use to hold the primary key ID column values
In Line 5, I am creating a new table in the database, from the input parameters using the sequence to generate the primary key ID field.

Lines 7 and 9 just get the ball rolling after the setup.

Why this blog posting? Well - I never actually found a blog with this exact example of reading a csvfile into a file based H2 database and adding a new ID column based on a dynamically created sequence so I wanted to share back. As I mentioned, I borrowed heavily from the gcsvsql script, so you should definitely check that out.

If you want to see the entire script - leave me a comment and I will add the file to a future blog post along with my Groovy Sql reading of the data.

I hope this was useful to someone out there.

No comments: