Storing custom (XML) data

Managed by | Updated .

When fetching custom data, ensure that the data is stored in the data folder, not the conf one. This has multiple benefits:

  • Doesn't clutter the configuration
  • Prevents large data files to be checked in VCS, when the configuration folder is version controlled (and be seen as always "modified" in the checkout since they'll change for every update)
  • Benefits from the live / offline system so a backup copy of the previous data is always available in the offline view

Ideally, custom data should be placed under $SEARCH_HOME/data/<collection>/offline/data/. That's the standard Funnelback folder to hold data to be indexed (e.g. WARC files, XML files, etc.) so if the custom data is written in this directly no additional configuration is required for the update pipeline to work.

In some cases writing directly to the data folder is not possible. For example, when using a Filecopy collection, the data will get copied by the Filecopier from the source folder into offline/data/. Similarly, if you need to filter/transform the data the source folder needs to be different from the target offline/data/ folder. In that case, store the data inside $SEARCH_HOME/data/<collection>/offline/tmp/. This folder is also a standard Funnelback folder, and it's assumed the data is temporary since it's only used as a source for transformation or filecopying.

Was this artcle helpful?