Confluence

Managed by | Updated .

Background

This article describes how a web collection can be used to create an index of Atlassian Confluence.

Note

  • This method does not support document level security.
  • If authentication is used for the crawler the results returned will match what the crawl user has access to.
  • A user that does not have access to a returned result will get an access denied message when attempting to access the page in confluence.

Method

  1. Create a web collection for the confluence index

    When creating the web collection use the following settings:

    In all examples below replace <CONFLUENCE HOST> with your Confluence host name (e.g. https://confluence.mydomain.com/)

    Start URL

    <CONFLUENCE HOST>/dashboard.action
    

    Include content from

    <CONFLUENCE HOST>/
    

    Exclude content from

    Add the following additional exclude patterns:

    /display/KB/
    /admin
    /display/status/
    /label/
    com.atlassian.plugins
    &navigatingVersions=true
    output=rss
    login.action
    forgotuserpassword.action
    copypage.action
    changes.action
    createblogpost.action
    createpage.action
    diffpages.action
    diffpagesbyversion.action
    editattachment.action
    editinword
    editmyprofile.action
    editmyprofilepicture.action
    editpage.action
    exportspacehtml.action
    exportspacexml.action
    exportword
    followuser.action
    flyingpdf.action
    showgliffyeditor.action
    listpages-dirview.action
    logout.action
    osd.action
    pdfpageexport
    preview.action
    recentlyupdated.action
    replyToComment
    revertpagebacktoversion.action
    showCommentArea
    showComments
    uploadimport.action
    viewfollow.action
    viewinfo.action
    viewmydrafts.action
    viewmylabels.action
    viewmyprofile.action
    viewmysettings.action
    viewnotifications.action
    viewpagesrc.action
    viewpagestorage.action
    createrssfeed.action
    viewpreviousversions.action
    space-bookmarks.action
    addfavourite.action
    removefavourite.action
    viewrecentblogposts.action
    dashboard.action
    ?version=
    
  2. Configure authentication (if required)

    If a login is required to access Confluence content then a crawl user must be set up so that the Confluence pages can be accessed and indexed. The user must have read only access to every space that should be included in the search results.

    Once the user is created create a form_interaction.cfg for the web collection.

    form_interaction.cfg

    <CONFLUENCE HOST>/login.action 3 parameters:[os_username=<CRAWL USER>&os_password=<CRAWL PASSWORD>]
    

    Note: the crawl user and crawl password must be URL encoded.

    If authentication is configured and users of the search don't have access to everything in the search results then you should ensure that the cache controller access is disabled on the Confluence web collection and any meta collection that includes the results. To do this set ui_cache_disabled=true in the collection.cfg for each of these collections.

Was this artcle helpful?

Tags
Type: Keywords:
Features:
Web

Comments