Installing A Single Instance of SolrCloud

After quite a bit of searching I can’t seem to find a simple example of setting up a single instance of SolrCloud on a local machine. I work with Solr every day. I do most of my development on my laptop and when everything is good I commit it and deploy. We recently hired someone who will be working with Solr and I wanted to get his laptop set up to run SolrCloud locally, too. However, I found it difficult to locate a document that I could just point him to. This is that document.

Note that I understand the installation described below is likely to be useful only when doing development. What’s the point of having a distributed SolrCloud when it’s only running on one machine?

See for a list of required SolrCloud configurations.


ZooKeeper ( is centralized service that is used to coordinate configuration information. You will be telling ZooKeeper where to find SolrCloud configuration files.

SolrCloud comes with an embedded ZooKeeper. However, our production configuration uses ZooKeeper as a stand-alone system and I want to mimic production.


  • Download ZooKeeper from Apache’s site
  • Extract the downloaded file.
  • Follow the steps outlined in the getting started guide  Here are the basics.  Be aware that this may change with future versions of ZooKeeper.
    • Copy ZOOKEEPER_DIR/conf/zoo_sample.cfg to ZOOKEEPER_DIR/conf/zoo.cfg
    • I changed the value of dataDir in ZOOKEEPER_DIR/conf/zoo.cfg to an existing empty directory
    • Start zookeeper: ZOOKEEPER_DIR/bin/ start


Verify that ZooKeeper is running:

ZOOKEEPER_DIR/bin/ -server

You should see a command prompt that looks something like this:

[zk: 0]

Enter quit to exit the client

[zk: 0] quit

If you get a Connection refused error you know the server is not running.

ZooKeeper and Solr’s Configuration Files

Using the SOLR_DIR/example/scripts/cloud-scripts/ script upload Solr configuration files to ZooKeeper:

SOLR_DIR/example/scripts/cloud-scripts/ -zkhost localhost:2181 -cmd upconfig -confdir SOLR_DIR/example/solr/my-collection/conf -confname my-collection-config

From this you should see a bunch of output including a list of the configuration files found in the directory pointed to by the “-confdir” flag.

Start SolrCloud

Start up an instance of SolrCloud:

java -jar SOLR_DIR/bin/solr start -z localhost:2181 -cloud

The “-cloud” flag tells Solr to start in a cloud configuration.  The “-z localhost:2181” flag tells Solr how to connect to ZooKeeper where it will find configuration information.

You may now look at the SolrCloud admin page found here: http://localhost:8983/solr/#/

Create New Solr Collection

So far we’ve uploaded a set of Solr configuration files to ZooKeeper and started an instance of SolrCloud.  Next we need to create a new Solr collection telling Solr how to find its configuration in ZooKeeper.

curl 'localhost:8983/solr/admin/collections?action=CREATE&numShards=1&name=my-collection&collection.configName=my-collection-config'

Notice how the value of “collection.configName” is the same as what was used in “upconfig” command that was sent to ZooKeeper: my-collection-config  This tells Solr to use that name when asking ZooKeeper for the configuration for this new collection.

The “numShards” parameter is required.  The documentation ( is a little confusing.  The table says it is not required but the description says otherwise.  I found that if I do not provide the “numShards” parameter the response from Solr is

org.apache.solr.common.SolrException:org.apache.solr.common.SolrException: numShards is a required param

As such, I just set the value to 1 and everything works as expected.

To make this process easier I created a few scripts which can be found here on github:

Using Hadoop to Create SOLR Indexes

One of the most challenging projects I faced at work recently was to create a Apache SOLR index consisting of approx 15 million records. This index had been created once in the history of the company using a MySQL database and SOLR’s Data Import Handler (DIH). It had not been attempted since then because the original indexing process was time consuming (12-14 hours), required human supervision, and on failure had to be restarted from the very beginning.
Continue reading