Splitting XML files

Managed by | Updated .

Background

The Funnelback indexer includes built in support for the splitting of XML files on a set element path.

After splitting, each record matched by the element path will be indexed as a separate document within Funnelback.

Element paths

The element path is a case sensitive pattern to match the pseudo-path to an element, or attribute and is a simple form of an X-Path.

  • If the path begins with / then the path is absolute (it matches from the top of the XML structure).
  • If it begins with // it is unanchored (it can be located anywhere in the XML structure).

XML attributes can be used by adding @attribute to the end of the path.

Attribute values are not supported in element path definitions.

Example element paths:

# Valid paths
/items/item
//item/keywords/keyword
//keyword
//image@url
# Invalid paths
/items/item[@type=value]

Configuring the element path used for splitting

The xml.cfg includes a number of special elements that control various aspects of XML indexing.

The document element can be given a single absolute element path which is then used to split the XML document at index time.

The docurl element can be given a single absolute element path to an element (within the split record) that holds the value of the URL that should be assigned to the document.  Note: this should be a unique value - if duplicate docurls are encountered the indexer will remove duplicates.

e.g. a sample xml.cfg might look like:

PADRE XML Mapping Version: 2
# Define which element to split documents on
document,/events/event
# Define which element from a document should be used as a URL
docurl,//eventUrl 
# Indexed fields
title,1,,//eventTitle
description,1,,//longDescription 
# Non-indexed fields
datestart,0,,//date_start
dateend,0,,//date_end
d,0,,//eventDateSubmitted

This will split the document on the /events/event element path and assign the URL sourced from the eventUrl field contained within each record.

See: https://docs.funnelback.com/more/extra/xml_cfg.html#example-1-multiple-records-per-xml-file for the full example.

Was this artcle helpful?

Tags
Type: Keywords:
Features:
XML

Comments