Result diversification

Managed by | Updated .

There are a number of ranking options that are designed to increase the diversity of the result set. These options can be used to reduce the likelihood of result sets being flooded by results from the same website, collection etc.

Same site suppression

Each website has a unique information profile and some sites naturally rank better than others. Search engine optimisation (SEO) techniques assist with improving a website’s natural ranking.

Same site suppression can be used to downweight consecutive results from the same website resulting in a more diverse set of search results.

Same site suppression is configured by setting the following ranking options:

  • SSS: controls the depth of comparison (in the URL) used to determining what a site is. This corresponds to the depth of the URL (or the number of subfolders in a URL).
    • Range: 0-1000
    • SSS=0 – no suppression (default for non-web collections)
    • SSS=2 – default for web and meta collections (site name + first level folder)
    • SSS=10 – special meaning for big web applications.
  • SameSiteSuppressionExponent: Controls the downweight penalty applied. Larger values result in greater downweight.
    • Range: 0.0 – unlimited (default = 0.5)
    • Recommended value: between 0.2 and 0.7
  • SameSiteSuppressionOffset: Controls how many documents are displayed beyond the first document from the same site before any downweight is applied.
    • Range: 0-1000 (default = 0)
    • sss_defeat_pattern: URLs matching the simple string pattern are excluded from same site suppression.

Example:

query_processor_options= -SSS=3 -SameSiteSuppressionExponent=0.6 -SameSiteSuppressionOffset=2 -sss_defeat_pattern=Media 

Same meta suppression

Downweights subsequent results that contain the same value in a specified metadata field. Same meta suppression is controlled by the following ranking options:

  • same_meta_suppression: Controls the downweight penalty applied for consecutive documents that have the same metadata field value.
    • Range: 0.0-1.0 (default = 0.0)
    • meta_suppression_field: Controls the metadata field used for the comparison. Note: only a single metadata field can be specified.

Example:

query_processor_options= -same_meta_suppression=0.7 -meta_suppression_field=subject

Same collection suppression

Downweights subsequent results that come from the same collection. This provides similar functionality to the meta collection component weighting above and could be used in conjunction with it to provide an increased influence. Same collection suppression is controlled by the following ranking options:

  • same_collection_suppression: Controls the downweight penalty applied for consecutive documents that reside in the same collection.
    • Range: 0.0-1.0 (default = 0.0)

Example:

query_processor_options= -same_collection_suppression=0.45

Same title suppression

Downweights subsequent results that contain the same title. Same title suppression is controlled by the following ranking options:

  • title_dup_factor: Controls the downweight penalty applied for consecutive documents that have the same title value.
    • Range: 0.0-1.0 (default = 0.5)

Example:

query_processor_options= -title_dup_factor=0.63

Result collapsing

While not a ranking option, result collapsing can be used to effectively diversify the result set by grouping similar result items together into a single result.

Results are considered to be similar if:

  • They share near-identical content
  • The have identical values in one or a set of metadata fields.

Result collapsing requires configuration that affects both the indexing and query time behaviour of Funnelback.

See: documentation: result collapsing

Was this artcle helpful?

Comments