Managed by | Updated .
The Funnelback indexer includes built in support for the splitting of XML files on a set element path.
After splitting, each record matched by the element path will be indexed as a separate document within Funnelback.
The element path is a case sensitive pattern to match the pseudo-path to an element, or attribute and is a simple form of an X-Path.
- If the path begins with / then the path is absolute (it matches from the top of the XML structure).
- If it begins with // it is unanchored (it can be located anywhere in the XML structure).
XML attributes can be used by adding @attribute to the end of the path.
Attribute values are not supported in element path definitions.
Example element paths:
# Valid paths /items/item //item/keywords/keyword //keyword //image@url # Invalid paths /items/item[@type=value]
Configuring the element path used for splitting
The xml.cfg includes a number of special elements that control various aspects of XML indexing.
The document element can be given a single absolute element path which is then used to split the XML document at index time.
The docurl element can be given a single absolute element path to an element (within the split record) that holds the value of the URL that should be assigned to the document. Note: this should be a unique value - if duplicate docurls are encountered the indexer will remove duplicates.
e.g. a sample xml.cfg might look like:
PADRE XML Mapping Version: 2 # Define which element to split documents on document,/events/event # Define which element from a document should be used as a URL docurl,//eventUrl # Indexed fields title,1,,//eventTitle description,1,,//longDescription # Non-indexed fields datestart,0,,//date_start dateend,0,,//date_end d,0,,//eventDateSubmitted
This will split the document on the /events/event element path and assign the URL sourced from the eventUrl field contained within each record.
See: https://docs.funnelback.com/more/extra/xml_cfg.html#example-1-multiple-records-per-xml-file for the full example.