Confluence
Managed by | Updated .
Background
This article describes how a web collection can be used to create an index of Atlassian Confluence.
Note
- This method does not support document level security.
- If authentication is used for the crawler the results returned will match what the crawl user has access to.
- A user that does not have access to a returned result will get an access denied message when attempting to access the page in confluence.
Method
Create a web collection for the confluence index
When creating the web collection use the following settings:
In all examples below replace
<CONFLUENCE HOST>
with your Confluence host name (e.g.https://confluence.mydomain.com/
)Start URL
<CONFLUENCE HOST>/dashboard.action
Include content from
<CONFLUENCE HOST>/
Exclude content from
Add the following additional exclude patterns:
/display/KB/ /admin /display/status/ /label/ com.atlassian.plugins &navigatingVersions=true output=rss login.action forgotuserpassword.action copypage.action changes.action createblogpost.action createpage.action diffpages.action diffpagesbyversion.action editattachment.action editinword editmyprofile.action editmyprofilepicture.action editpage.action exportspacehtml.action exportspacexml.action exportword followuser.action flyingpdf.action showgliffyeditor.action listpages-dirview.action logout.action osd.action pdfpageexport preview.action recentlyupdated.action replyToComment revertpagebacktoversion.action showCommentArea showComments uploadimport.action viewfollow.action viewinfo.action viewmydrafts.action viewmylabels.action viewmyprofile.action viewmysettings.action viewnotifications.action viewpagesrc.action viewpagestorage.action createrssfeed.action viewpreviousversions.action space-bookmarks.action addfavourite.action removefavourite.action viewrecentblogposts.action dashboard.action ?version=
Configure authentication (if required)
If a login is required to access Confluence content then a crawl user must be set up so that the Confluence pages can be accessed and indexed. The user must have read only access to every space that should be included in the search results.
Once the user is created create a
form_interaction.cfg
for the web collection.form_interaction.cfg
<CONFLUENCE HOST>/login.action 3 parameters:[os_username=<CRAWL USER>&os_password=<CRAWL PASSWORD>]
Note: the crawl user and crawl password must be URL encoded.
If authentication is configured and users of the search don't have access to everything in the search results then you should ensure that the cache controller access is disabled on the Confluence web collection and any meta collection that includes the results. To do this set
ui_cache_disabled=true
in thecollection.cfg
for each of these collections.