Using the Funnelback index to generate gscope and kill configuration files

Managed by | Updated .

Warning

padre-sw currently has a bug (tested in v14.2.2, but probably also in all previous versions that support these result modes) that causes html encoding to occur in the URLs that are printed.

When using this method to generate configuration it may be necessary to sanitise the output (with something like sed) to fix the URLs.

A command similar to the following can be used to clean the output:

cat padreswgenerated.cfg | sed -e 's/amp;//g' -e 's/<!--.*\?-->//g' -e '/^$/d' > cleanedconfig.cfg 

This command removes the html encoded ampersands and also lines containing html comments and empty lines.

Generating a gscopes.cfg

1. Run a padre query with the following query processor options:

  • -res=gscopes (or -res=docnums) - tells padre to return results as gscope.cfg compatible text.  -res=gscopes returns matching items with URLs.  -res=docnums returns matching items as document numbers.   Using -res=docnums is safer because there is a 10000 item limit for padre-gs when using a gscopes.cfg file that contains URLs.
  • -gscoperesult= - tells padre the gscope number to use in the result return (default value is to return the documents with a gscope of 1)

e.g.

QUERY_STRING='collection=mycollection&query=myquery&view=offline' $SEARCH_HOME/bin/padre-sw -res=docnums -gscoperesult=4 > $SEARCH_HOME/conf/mycollection/mygscopes.cfg

The output is written to mygscopes.cfg and is a valid gscopes.cfg

2. Apply the gscopes

To apply the gscopes to the index run the following command.

$SEARCH_HOME/bin/padre-gs $SEARCH_HOME/data/mycollection/offline/idx/index $SEARCH_HOME/conf/mycollection/mygscopes.cfg -docnum

Omit -docnum if you've used -res=gscopes to generate the configuration file.

Generating a kill_exact.cfg

1. Run a padre query with the following query processor options:

  • -res=flcfg - tells padre to return results as kill_exact.cfg compatible text.

e.g.

QUERY_STRING='collection=mycollection&amp;query=myquery&amp;view=offline' $SEARCH_HOME/bin/padre-sw -res=flcfg > $SEARCH_HOME/conf/mycollection/kill_list.cfg

The output is written to kill_list.cfg and is a valid kill_exact.cfg

2. Apply the kill

To apply the kill to the index run the following command.

$SEARCH_HOME/bin/padre-fl $SEARCH_HOME/data/mycollection/offline/idx/index $SEARCH_HOME/conf/mycollection/kill_list.cfg -exactmatch -kill

Generating a qie.cfg

1. Run a padre query with the following query processor options:

  • -res=qiecfg - tells padre to return results as qie.cfg compatible text.
  • -qieval= - tells padre the weight to return with each URL

eg.

QUERY_STRING='collection=mycollection&amp;query=myquery&amp;view=offline' $SEARCH_HOME/bin/padre-sw -res=qiecfg > $SEARCH_HOME/conf/mycollection/myqie.cfg

The output is written to myqie.cfg and is a valid qie.cfg

2. Apply the QIE

To apply the query independent evidence to the index run the following command.

$SEARCH_HOME/bin/padre-qi $SEARCH_HOME/data/mycollection/offline/idx/index $SEARCH_HOME/conf/mycollection/myqie.cfg

Alternate method for generating gscope/kill/qie configuration files

This method involves running a query against the Funnelback index and using a custom template to return the configuration file.

This method is required for older versions of Funnelback that don't support the additional padre-sw result modes.  This method also works more reliably under Windows (requires Cygwin for the CURL command though).

1. Create a custom template containing the following code:

qie.ftl
<#ftl encoding="utf-8" />
<#import "/web/templates/modernui/funnelback_classic.ftl" as s/>
<#import "/web/templates/modernui/funnelback.ftl" as fb/>
<@s.Results>
<#if s.result.class.simpleName != "TierBar">
<#compress><#if question.inputParameterMap["wt"]?exists>${question.inputParameterMap["wt"]?html} </#if>${s.result.liveUrl}</#compress>
</#if>
</@s.Results>

The template checks for an option custom CGI parameter (wt) that contains the weighting to apply for each line.  This can contain the QIE weight to assign, or gscope ID to return.

2. Create post index workflow to run a curl command that saves the configuration file.  Set appropriate values for num_ranks, query, collection and wt.  wt is not required if you are generating a kill_exact.cfg

curl --connect-timeout 60 --retry 3 --retry-delay 20 "http://127.0.0.1/s/search.html?query=QUERY&num_ranks=LARGE_VALUE&wt=WT&view=offline&collection=COLLECTION&form=qie&profile=_default_preview" -o $SEARCH_HOME/conf/$COLLECTION_NAME/CONFIG.cfg || exit 1

Repeat this command for each different gscope number/QIE weight you require.

Was this artcle helpful?

Comments