Managing DNN Search and Search Crawlers

Overview

DNN Search is one of the main core components of DNN and is used extensively to help users locate a specific file, page, user, or content. DNN Search is built on Lucene, an open-source search engine software library. This article provides information on the basics of the Search and its crawlers and the troubleshooting methods when the Search does not function as intended.

 

Prerequisites

 

Introduction

What Are Search Crawlers?

Search Crawlers are essentially scheduled tasks that run periodically to “crawl” through the site’s files, content, and users, which in turn creates index files with the items that the crawler found.

To understand Search Crawlers, we will need to take a look at each crawler and its role in DNN Search.  

You can access the Search Crawlers by going into Settings > Scheduler > Scheduler Tab.

 

2020-06-02_1405.png

 

Search: File Crawler

Microsoft Office and PDF documents are indexed using the File Crawler. We use standard iFilters to index Office, PDF, and other document types.

Note that PDF indexing will require the installation of Adobe IFilter on the server where DNN was installed.

The iFilters will enable a feature called “Deep Indexing”.

PDF Deep indexing is a feature where the crawler will crawl inside to the content of the PDF file and display the content in the Search Results.

This is accomplished with the Adobe PDF IFilter. This allows the user to easily search for text within Adobe PDF documents.

 

Search: Site Crawler

This crawler focuses on indexing content that is residing in modules. This crawler will index all HTML modules and some select third-party modules.

 

Search: URL Crawler

The URL Crawler relies on the parsing of HTML pages and links. While the Site crawler will crawl most of the content, the URL Crawler will still play an important role in indexing all content that is not part of a module.

  • This includes content that is part of your site skin, navigation as well as content that may be located on other sites that you want to federate in your search results.

 

Back to top


 

How Do Crawlers Work with DNN Search?

Once the crawler has created the index files, the DNN search will read the index files that were generated to display the relevant results to the user. Imagine Lucene as a file-based database where DNN Search will be searching the database using the keywords of the search. 

  • The Index database files that are generated are stored in the SiteRoot/App_Data/Search folder.

Importantly, for the search crawler to be able to recognize that a specific piece of content should be searchable, the component encompassing that content must implement a class ISearchable or the ModuleSearchBase.

 

Back to top


 

Description

Running the Search Crawler

The search crawler can be run through Settings > Scheduler > Scheduler > Edit Search: File Crawler, Search: Site Crawler, or Search: Url Crawler.

Important things to note are:

  1. The frequency of the Search schedulers should be changed depending on how much content you are updating in your environment. The higher the frequency, the more amount of times the search crawler will crawl through the content in your environment. Keep in mind each crawl does take up server resources, it is recommended that the frequency is lower.
  2. Defining a server name is important as the search index cannot run unless the server that is defined is a valid server. You can confirm the available servers in Settings > Servers.

    Note: You are in a Web Farm (More than one server environment), you will need to define one server out of the servers in the farm to avoid file locks.

2020-06-02_1443.png

 

 

Performing a Search Reindex

A Search Reindex is a procedure to rebuild the indexes so that any entries that may have been missed can be indexed once again. A Reindex deletes existing content from the Index Store and then reindexes everything. Reindexing is done as part of the search crawler(s) scheduled task.

  • The Search Crawler should be run manually from the scheduler to perform an immediate reindex.

Compacting of Index reclaims space from deleted items in the Index Store. Compacting is recommended only when there are many 'Deleted Documents' in Index Store. Compacting may require twice the size of the current Index Store during processing.

Reindexing is CPU intensive, and it is best to run this in low traffic hours.

  • A Reindex can be performed in Settings > Site Settings > Search > Reindex Content.

 

Troubleshooting DNN Search

With the complexity of DNN Search, there are bound to be issues that come along with it. To troubleshoot DNN Search efficiently and effectively, you will need to formulate a plan of action and work with the information that is present.

 

Scenarios

“I cannot find any results in the DNN Search.”

  • You would most likely need to request for more information, but from this, we know that the search is not returning any results, which could mean a loss of data or the indexer is not working.

 

“I am receiving errors such as write.lock has been timed out.”

  • When you receive errors such as write.lock has been timed out, this usually means that two or more processes are trying to access the same resource (Index file). This could be due to antiviruses or not specifically defining a server name (in the server field) in the Search scheduler settings. The write.lock file is a physical file that exists in the Search folder. When the write.lock file is present, this means that the search crawler is currently indexing a file.

 

“I can see some results are returning, but not all of them.”

  • This could mean that there was some corruption in the index file or that the search scheduler has stopped indexing. This could be an easy fix, such as re-enabling the search scheduler, but if all else fails, a full reindex will be required.

 

You can troubleshoot further by accessing the index files in the Search folder using an application called Luke. Using this client will provide you with the ability to look inside the physical index file and search for any discrepancies in the data that is being indexed.

 

Back to top


 

Common Usage

Performing a Full Reindex

A Full Reindex is a reindex with an extra step. Before performing the reindex, completely delete the Search index files. Once the files have been removed, execute the reindex. This will completely wipe out all the search indexes if there was any corruption. 

This should be a last resort fix to resolving Search index issues and should never be the first solution as some search issues can be resolved without having to delete site files.

If you are unable to delete the write.lock file due to file lock errors, you should recycle the application pool, and that should be enough to release the lock. 

  • The search index files can be found in the SiteRoot/App_Data/Search folder.

 

Related Articles

 

 

Comments

0 comments

Please sign in to leave a comment.