Troubleshooting XML

Managed by | Updated .

XML file is not split

  • Check the Document parameter that is defined in xml.cfg - this needs to be an absolute XPath to the field that contains the item.
    • Check that xpath is valid for document. e.g items/item works however items/item/ fails.
  • Check the index.log to ensure that the XML file is detected as XML. If it's incorrectly detected it may be missing an XML declaration line. Try adding the -forcexml indexer option.        

Search results match XML records that don't appear to contain the content

Indexing XML fields that contain HTML content

  • You may need to set the '+' xml.cfg class - this indicates the fields that contain XML/HTML encoded data. (see http://docs.funnelback.com/xml_cfg.html)        
  • If you have mapped everything you wish to have included then you can map a non-existent XPath to the '-' field. eg. -,,,/non_existent/xpath        

Index is not detecting all the XML items

  • This is most likely to be a chamber that is too small. Try increasing the chamber size using the -chamb indexer option. Increasing this value should cause more records to be detected. Keep increasing this value until all the items are detected. (you may need 2-3 times the size of XML file).        

Counting items in an XML file

  • You can use xmllint to count the number of items in an XML file based on an XPath. If you wish to see how many items should be extracted you can run the following commands. Replace XMLFILE.xml with the XML file you wish to inspect. Replace Document XPath with the Document value as defined in your xml.cfg.                

    user@server> xmllint --shell XMLFILE.xml
    / > xpath count(Document XPATH)
    Object is a number : 10096
    / > exit
  • Compare the number counted with the number of items that padre discovers in the collection's index.log (v13.2 or earlier) or Step-Index.log (v14+). Look for the individual numbered lines for each item that is extracted from the XML rather than looking at the summary at the bottom of the file the collection    might    include    multiple    XML files.      
  • The following can also be used for finding the number of items in an xml file:

    If the Document value for xml.cfg is /feed/entry then the following command can be used to find the number of items that should be indexed:          

        grep -oh '<entry' XMLFILE.xml | wc -w
Was this artcle helpful?

Comments