Update failures

Managed by | Updated .

An update can fail for numerous reasons. The following provides some high level guidance by providing some common failures and how to debug them.

The first step is to check the collection’s update log and see where and why the update failed. Look for error lines. Common errors include:

Failed to access seed page: For some reason Funnelback was unable to access the seed page so the whole update failed (as there is nothing to crawl). Look at the offline crawl logs and url error log for more information. Cause could be a timeout, or a password expiring if you are crawling with authentication.

Failed changeover conditions: After building the index a check is done comparing with the previous index. If the index shrinks below a threshhold then the update will fail. This can occur if one of the sites was down when the crawl occurred, or if there were excessive timeouts, or if the site has shrunk (eg. because it has been redeveloped or part of it archived). If a shrink in size is expected you can run an advanced update and swap the views.

Failures during filtering: Occasionally the filtering process crashes causing an update to fail. The offline filter log may provide further information to the cause.

Swap views – No such file or directory. Is <offline folder> being used by another program?: Under Windows swap views renames the offline and live folders so if something (eg. explorer in a remote desktop session) has a lock on the folder when it attempts to change then the update will fail. Solution – make sure all folders/files in the data folders are closed then select the collection, run an advanced update and swap the views.

Lock file exists: The update could not start because a lock file was preventing the update. This could be because another update on the collection was running; or a previous update crashed leaving the lock files in place.

The lock can be cleared from the administration interface by selecting the collection then clicking on the clear locks link that should be showing on the update tab.

The update locks can also be removed by running the following command from a command prompt:

$SEARCH_HOME\bin\mediator.pl ClearLocks collection=<COLLECTION_ID>

Where <COLLECTION_ID> is the id of the collection that is locked from updating.

Failures during indexing: Have a look at the offline index log for more details.

General information

Each time a collection is updated, an update log containing status information about each stage of the update process is created. These update logs are named update-<collection>.log and can be viewed in the log viewer.

The update log (update-COLLECTIONNAME.log) contains several lines relating to the completion status of each stag in the update process. Depending on the collection type, the update stages include:

Crawl pages from the Web or extract content from a database or copy files from a filesystem (crawl.log).

  • Filter binary documents such as Word and PDF (filter.log).
  • Eliminate duplicate documents (duplicates.log).
  • Index the documents (index.log).
  • Swap the live and offline indexes.

If any of the stages fail then the update log will reveal a non-zero exit code for that stage and a message stating that the stage has failed.

Sometimes it is possible to immediately diagnose the problem from the error message in the update log. Otherwise it will be necessary to examine the detailed log file of the stage that failed. These detailed logs can be viewed by clicking the links under the "Offline log files" folder in the log viewer.

Crawler halted early or failed to start

If the update log indicates a crawl failure then examine the crawl.log. Common problems include:

Licence key failure: Check that the licence key valid is installed and that the search machine has been configured with a fully qualified domain name. Sometimes adding an entry containing the full hostname to the system hosts file will correct this problem.

Java not installed correctly: Check that Java is installed on the search machine and that the "java" setting in SEARCH_HOME/conf/executables.cfg is set correctly.

Crawler can't access the seed page: The crawler may be blocked from the seed page as a result of network authentication, network problems, incorrect network protocol (e.g. specifying http instead of https).

Can't stop or start a collection update (stale lockfile) : If a server loses power or is abnormally interrupted during an update, a lock file indicating that the update is still in progress may be left behind, preventing new updates from beginning, and preventing the update from being stopped with the message "The collection 'x' is not currently being updated". 

Such stale lock files must be removed manually by running the following command from a terminal: $SEARCH_HOME/bin/mediator.pl ClearLocks collection=COLLECTION_NAME (where COLLECTION_NAME is the collection that is locked from updating).

Note: Before removing lock files, please ensure that the update has been terminated by checking the servers process list for any perl processes running the update.pl script. 
The default $SEARCH_HOME on windows is C:\funnelback, and on Unix systems it is /opt/funnelback.

Debugging failed updates

An update can fail for numerous reasons.  The following provides some high level guidance by providing some common failures and how to debug them.

The first thing to do is look at the collection's update log and see where and why the update failed. Look for error lines. 

Once you've found where the error occurred then you can investigate further by looking at the offline log files relating to the part of the update where the failure occurred.

If the update failed during the gather phase, the following logs may hold further information:

  • Web collections:
    • crawl.log
    • crawl.log.X
    • url_errors.log
    • crawler.inline_filter.log
  • Filecopy collection
    • Filecopier.log
  • TRIM collection
    • trim.log

If the update failed during the index phase, the following logs may hold further information:

  • index.log

If the update failed during the swap phase then one of the following is probably true:

  1. Changeover conditions are not met (new index is much smaller than previous index - below chageover_percent).  If this is expected then forcing swap views via an advanced update will resolve this issue.
  2. Live or offline folder is locked by the OS (common problem with Windows).  Under windows the swap views process renames the live and offline folders - if the OS has a lock on either of these the update will fail.  Ensure whatever is locking the folder is terminated then force the swap views.
Was this artcle helpful?

Tags
Type:
Features:

Comments