For this entry I assume you already know how to configure SOLR’s Data Import Handler as that is how we’ll configure SOLR to use BigQuery:


Google’s Service Account File

Download the service account file as described here:  I used the JSON version of the file. For the sake of this entry I’ll call this file: service_account.json

JDBC Driver

Download the Simba JDBC driver as described here: I used version of their JDBC 4.2 drivers. The zip file I downloaded contains these JAR files:


Copy them to SOLR’s server/lib/ext directory and restart solr.

SOLR Configuration Files

Create a schema.xml file that contains the BigQuery fields that you’ll be importing.

The solr-data-config.xml will look something like the following (adjust your query appropriately).

     url="jdbc:bigquery://;ProjectId=<YOUR PROJECT ID>;OAuthType=0;OAuthServiceAcctEmail=<YOUR PROJECT'S EMAIL ADDRESS>;OAuthPvtKeyPath=/path/to/service_account.json;LogLevel=6;LogPath=/tmp/bq-log" />
   <document name="bq-doc">
       query="select id from `your-dataset-name.your-table`">


Long Queries

It is picky about long queries. I had maybe a 60 line query with many spaces before each line so that it lines up nicely. The import would not work. There was no error about long query or too many spaces. It simply would not work. I just happened upon the solution of removing extra spaces. This says that the max unresolved query length is 256 KB. My query even with spaces was not that long so I have to conclude there’s something in the Simba driver.

Commit Exception

One other thing to note is that the SOLR and Simba logs will show another exception but it will not stop the indexing process. You’ll see this when executing the data import.

Jan 16 19:54:19.731 ERROR 62 com.simba.googlebigquery.exceptions.ExceptionConverter.toSQLException: [Simba][JDBC](10040) Cannot use commit while Connection is in auto-commit mode.
java.sql.SQLException: [Simba][JDBC](10040) Cannot use commit while Connection is in auto-commit mode.
   at com.simba.googlebigquery.exceptions.ExceptionConverter.toSQLException(Unknown Source)
   at com.simba.googlebigquery.jdbc.common.SConnection.commit(Unknown Source)
   at org.apache.solr.handler.dataimport.JdbcDataSource.closeConnection(
   at org.apache.solr.handler.dataimport.JdbcDataSource.close(

In there is a comment “//SOLR-2045“. Evidently, because of DB2 the SOLR developers added a commit so that the connections are released. The problem with this is that you have to set autoCommit to “true”. Hence, the above error. Luckily the commit is in a try/catch block where the catch is ignored and the SOLR code just continues with closing the connection.


The autoCommit=”true” is needed. The Simba JDBC drivers will give you issues if you don’t include it and set it to “true”:

java.sql.SQLFeatureNotSupportedException: [Simba][JDBC](10220) Driver does not support this optional feature.
   at com.simba.googlebigquery.exceptions.ExceptionConverter.toSQLException(Unknown Source)
   at com.simba.googlebigquery.jdbc.common.SConnection.setAutoCommit(Unknown Source)
   at org.apache.solr.handler.dataimport.JdbcDataSource$1.initializeConnection(

It doesn’t support the feature but you have to have it.

Connection URL

You can look here to get details about the connection URL starting here: I’ll provide some info here.


To turn logging off, set LogLevel in the connection URL to 0. The LogPath points to a directory under which a couple of log files will be written: BigQueryJDBC_driver.log and BigQuery_connection_0.log. Note that nothing will be written when the LogLevel is 0. The directory won’t even be created. The example above sets the level to 6, the highest level. I figure that’s a good setting for getting started so you can see everything that’s logged.


Categories: solr


Leave a Reply

%d bloggers like this: