Splitting XML files

Managed by | Updated .


The Funnelback indexer includes built in support for the splitting of XML files on a set element path.

After splitting, each record matched by the element path will be indexed as a separate document within Funnelback.

Element paths

The element path is a case sensitive pattern to match the pseudo-path to an element, or attribute and is a simple form of an X-Path.

  • If the path begins with / then the path is absolute (it matches from the top of the XML structure).
  • If it begins with // it is unanchored (it can be located anywhere in the XML structure).

XML attributes can be used by adding @attribute to the end of the path.

Attribute values are not supported in element path definitions.

Example element paths:

# Valid paths
# Invalid paths

Configuring the element path used for splitting

The xml.cfg includes a number of special elements that control various aspects of XML indexing.

The document element can be given a single absolute element path which is then used to split the XML document at index time.

The docurl element can be given a single absolute element path to an element (within the split record) that holds the value of the URL that should be assigned to the document.  Note: this should be a unique value - if duplicate docurls are encountered the indexer will remove duplicates.

e.g. a sample xml.cfg might look like:

PADRE XML Mapping Version: 2
# Define which element to split documents on
# Define which element from a document should be used as a URL
# Indexed fields
# Non-indexed fields

This will split the document on the /events/event element path and assign the URL sourced from the eventUrl field contained within each record.

See: https://docs.funnelback.com/more/extra/xml_cfg.html#example-1-multiple-records-per-xml-file for the full example.

Was this artcle helpful?

Type: Keywords: