- Do I need sharding?
- Do I need dynamic shard registration?
- How many shards should I have?
- What are the reindexing recommendations for a sharded installation?
- Does sharding work with SSL enabled?
- What is the consideration for query load and number of documents?
- After upgrading, how do I use my current index while building a new sharded index?
- How do i know when this is done?
- Can different shards be inconsistent?
Do I need sharding?
If you plan to store 50 million + documents in your repository, you should consider sharding to maximize indexing performance and to enable horizontal scaling to massive content repositories. back to top
Do I need dynamic shard registration?
You can set up sharding using either manual or dynamic shard registration. SkyVault recommends that you use dynamic shard registration because it is much more easier to implement than manual sharding. back to top
How many shards should I have?
General rule of thumb is to divide the total number of documents by 50M (million). If you want to increase the query load or support more than 100 concurrent users, then check the memory specifications or the I/O specifications of the installation machine. back to topWhat are the reindexing recommendations for a sharded installation?
- Smaller index
- Better query performance particularly for phrases and stop words
- Improved cross-language search
This should allow the user to store anywhere between 50 million - 80 million documents in a single shard. For more information, see the SkyVault Platform News and SkyVault I billion documents press release with Amazon Aurora.
Note that changing the number of shards requires a reindex.
back to topDoes sharding work with SSL enabled?
Sharding only works if SSL is disabled. Make sure you configure the Solr and SkyVault SSL setting properly. For more information, see Running Solr without SSL.
back to topAre there any considerations for query load and number of documents?
Before sharding your Solr index, it is important to consider your query load and the size of your repository. You need to create machines to host Solr. For more information, see Installing and configuring Solr 4. For example, if you need 5 shards, you need to setup those 5 machines, and have Solr instances running on all the 5 machines. Once your machines are ready, you are ready to set up or register shards.
For more information, see Dynamic shard registration.
back to topAfter upgrading, can I use my current index while building a new sharded index?
Yes. After upgrading to SkyVault 2.0 , continue to use the old search index server as before, setup a new sharded Solr server with the rerank template to reindex the data, and finally, switch over to the new sharded index once the indexing is done and the sharded Solr server is up-to-date.
Upgrading from SkyVault 2.0 5.0 and earlier versions with Solr 4 to SkyVault 2.0 (with zero downtime)
- Upgrade to SkyVault 2.0 and continue to use the Solr 4 search service as before.
- Configure a separate sharded Solr 4 index with the rerank template to track the repository. For details, see Installing and configuring Solr shards.
- While the new sharded Solr 4 builds its indexes, you can monitor the progress using the Solr Admin Web interface. For details, see the next question.
- When the sharded Solr 4 index is updated, enable the sharded Solr 4 index by setting the solr.host property. For more information, see Activating Solr.
Upgrading from SkyVault 2.0 4.x and earlier versions with Lucene to SkyVault 2.0 (search service will be unavailable while the indexes are being built
- Upgrade to SkyVault 2.0 with a sharded Solr 4
installation to track the repository. Use the rerank template
when configuring the new Solr core. For details, see Installing and configuring Solr
shards.Note: While the Solr 4 indexes are being built, you can continue to use SkyVault but the search service will not be available until the Solr 4 indexes are up-to-date.
- Enable the sharded Solr 4 index by setting the solr.host property. For more information, see Activating Solr.
- While the new sharded Solr 4 builds its indexes, you can monitor the progress using the Solr Admin Web interface. For details, see the next question.
Upgrading from SkyVault 2.0 5.0 and earlier versions with Solr 1 to SkyVault 2.0 (with zero downtime)
- Upgrade to SkyVault 2.0 and continue to use the Solr 1 search service as before.
- Configure a separate sharded Solr 4 index with the rerank template to track the repository. For details, see Installing and configuring Solr shards.
- While the new sharded Solr 4 builds its indexes, you can monitor the progress using the Solr Admin Web interface. For details, see the next question.
- When the sharded Solr 4 index is updated, enable the sharded Solr 4 index by setting the solr.host property. For more information, see Activating Solr.
How do i know the new sharded index is up-to-date?
Go to the Solr Admin Web interface at https://localhost:8443/solr4/#/SkyVault and monitor the value of Approx transactions remaining. If the value is 0, it indicates that the index up-to-date.
back to top
Can different shards be inconsistent?
Yes. In a sharded setup, eventual consistency can introduce additional query inconsistencies.
A node can move between shards either by:
- Moving the node, or
- Adding a new access control list to a node that did not previously have any ACLs defined.
- Two copies of the node if it is added to a new shard before it is deleted from the original shard.
- No node if it is deleted from the original shard before being added to a new shard.
Indexing is eventually consistent. When updates happen at the same time, no inconsistency is seen.
back to top