Fetching custom (XML) data

Managed by | Updated .

It is common to have to fetch a single XML data file from a CMS or a third party system, as a source of data for a collection to update. Various approaches are described below.

Custom collection

Since Funnelback v13 custom collections are the recommended way to implement custom data fetching logic. With custom collections, the fetching of the 3rd party source is done in Groovy in the main Groovy file for the custom collection. That saves relying on custom external scripts and make the collection more portable.

Alternative: Simple web collection

In some simple cases, the creation of a single-URL web collection gathering the XML feed may be a good approach:

collection.cfg
collection=Example Remote Single XML
collection_type=web
indexer_options=-forcexml
include_patterns=http://example.com/path/to/file.xml
collection.cfg.start.urls
http://example.com/path/to/file.xml

Fetching data from the "local" server

Sometimes data needs to be fetched from the local server, usually to generate a CSV query completion file from search results. Make sure you use the localhost name when doing so and not the actual name of the server. For example in a cURL fetching command:

post_swap_command=curl --connect-timeout 60 --retry 3 --retry-delay 20 "http://localhost/s/search.html?collection=$COLLECTION_NAME&query=!padrenullquery&profile=query-completion" -o $SEARCH_HOME/conf/$COLLECTION_NAME/query_completion.csv || exit 1

This will make your collection more portable when it will be moved to a different server (e.g. during upgrades). If you were using the server name instead of localhost, then the workflow commands would need to be edited whenever the collection is moved.

Was this artcle helpful?

Tags
Type:
Features:

Comments