Managed by | Updated .
It is common to have to fetch a single XML data file from a CMS or a third party system, as a source of data for a collection to update. Various approaches are described below.
Since Funnelback v13 custom collections are the recommended way to implement custom data fetching logic. With custom collections, the fetching of the 3rd party source is done in Groovy in the main Groovy file for the custom collection. That saves relying on custom external scripts and make the collection more portable.
Alternative: Simple web collection
In some simple cases, the creation of a single-URL web collection gathering the XML feed may be a good approach:
collection=Example Remote Single XML collection_type=web indexer_options=-forcexml include_patterns=http://example.com/path/to/file.xml
Fetching data from the "local" server
Sometimes data needs to be fetched from the local server, usually to generate a CSV query completion file from search results. Make sure you use the localhost name when doing so and not the actual name of the server. For example in a cURL fetching command:
post_swap_command=curl --connect-timeout 60 --retry 3 --retry-delay 20 "http://localhost/s/search.html?collection=$COLLECTION_NAME&query=!padrenullquery&profile=query-completion" -o $SEARCH_HOME/conf/$COLLECTION_NAME/query_completion.csv || exit 1
This will make your collection more portable when it will be moved to a different server (e.g. during upgrades). If you were using the server name instead of localhost, then the workflow commands would need to be edited whenever the collection is moved.