Downloading configuration files via workflow
Managed by | Updated .
Background
There is often a need to download configuration files as part of the Funnelback workflow. This can involve downloading an external file (such as an external_metdata.cfg
from a CMS), or accessing Funnelback to produce a configuration file such as auto-completion.csv
.
Curl is the recommended program to use when downloading via workflow. Please note this is preferred to using wget or other custom perl or python scripts that use internal libraries.
The following curl command should be used when downloading as part of workflow. This command sets a longer timeout as well as retries, and exits if there is an error.
query=!showall&num_ranks=1000&form=query_completion' -o $SEARCH_HOME/conf/myfaqs/query_completion.csv || exit 1
The commands are commonly called in workflow from a bash script or Windows batch file.
e.g. pre_gather.sh
or pre_gather.bat
:
# Linux - use single quotes
curl --connect-timeout 60 --retry 3 --retry-delay 20 '<url_to_download>' -o <output_file> || exit 1
# Windows - use double quotes (requires cygwin)
c:\cygwin\bin\curl.exe --connect-timeout 60 --retry 3 --retry-delay 20 "<url_to_download>" -o <output_file> || exit 1
# Windows - with wget.exe Note: only use this if curl is unavailable, or you are having problems with the curl commend
c:\funnelback\wbin\wget.exe -T 60 -t 3 -w 20 "<url_to_download>" -O <output_file> || exit 1
Also, please note the following:
- If required under Windows curl can be used by installing Cygwin and using the curl binary that is part of that product (note that the native Windows/DOS version of curl is too old and doesn't support a lot of options).
- curl support various forms of authentication (eg. Windows integrated authentication) which is sometimes required when downloading files.
- When downloading from a local Funnelback instance use the localhost address (
http://localhost/
) instead of a fully qualified address such ashttp://search.mysite.com
. Ensure the search port is included if Funnelback is running on a port other than 80 for search requests. - When downloading output from Funnelback consider using a profile that optimises the query and disables logging.
- If you are downloading external metadata please make use of the external metadata validator that is built in to Mediator.
Wget command is equivalent to curl, but curl is preferred for consistency (and also because has options for session cookies etc). Some versions of cygwin have a bug with
curl.exe
that results in errors like:cygwin curl.exe: *** fatal error - couldn't initialize fd 0 for /dev/cons0
If this is happening then fall back to the wget command.
Examples
Download external metadata from an external website or CMS
curl --connect-timeout 60 --retry 3 --retry-delay 20 'http://website.com/resources/external_metadata' -o $SEARCH_HOME/conf/website/external_metadata.cfg || exit 1
Download auto-completion CSV from Funnelback.
curl --connect-timeout 60 --retry 3 --retry-delay 20 'http://localhost/s/search.html?collection=myfaqs&query=!genqc&profile=autocompletion'