You are here

Solr 6 sharding methods

When an index grows too large to be stored on a single search server, it can be distributed across multiple search servers. This is known as sharding. The distributed/sharded index can then be searched using SkyVault/Solr's distributed search capabilities.

Solr 6 can use any of these four different methods for routing documents and ACLs to shards.

  • ACL (ACL_ID): This sharding method is most appropriate when there are lots of sites or nodes that have been assigned ACLs. For example, if you have many Share sites, nodes and ACLs are assigned to shards randomly based on the ACL and the documents to which it applies.

    The node distribution may be uneven as it depends how many nodes share ACLs. This method replaces the previous ACL based sharding method used in Solr 4 and distributes ACLs over the shards. Each shard contains only the access control information for the nodes it contains.

    To use this method, when creating a shard add a new configuration property:
    shard.method=PROPERTY
  • DBID (DB_ID): This is the default sharding option in Solr 6. Nodes are evenly distributed over the shards at random. Each shard contains copies of all the ACL information, so this information is replicated in each shard.
    To use this method, when creating a shard add a new configuration property:
    shard.method=DB_ID
  • Date/Datetime (DATE): The date-based sharding assigns dates sequentially through shards based on the month.

    For example: If there are 12 shards, each month would be assigned sequentially to each shard, wrapping round and starting again for each year. The non-random assignment facilitates easier shard management - dropping shards or scaling out replication for some date range. Typical ageing strategies could be based on the created date or destruction date.

    Each shard contains copies of all the ACL information, so this information is replicated in each shard. However, if the property is not present on a node, sharding falls back to the DBID murmur hash to randomly distribute these nodes.

    To use this method, when creating a shard add the new configuration properties:
    shard.key=exif:dateTimeOriginal
    shard.method=DATE
    Months can be grouped together, for example, by quarter. Each quarter of data would be assigned sequentially through the available shards.
    shard.date.grouping=3
  • Property (PROPERTY): This sharding method uses any d:date, d:datetime, or d:text property, for example, the recipient of an email, the creator of a node, some custom field set by a rule, or by the domain of an email recipient. The keys are randomly distributed over the shards using murmur hash.

    Each shard contains copies of all the ACL information, so this information is replicated in each shard. However, if the property is not present on a node, sharding falls back to the DBID murmur hash to randomly distribute these nodes.

    To use this method, when creating a shard add the new configuration properties:
    shard.key=cm:creator
    shard.method=PROPERTY

    It is possible to extract a part of the property value to use for sharding using a regular expression, for example, a year at the start of a string:

    shard.regex=^\d{4}
Note: The ACL (MOD_ACL_ID) sharding method is used only in Solr 4.