Write and publish your ideas, thoughts, blogs or articles on any topic. You are at right place for guest blogging. Post tutorials for knowledge sharing. Upload respective images and tag your YouTube or Facebook videos with the articles and share it on various social media with simple steps. Join us Free to add your post.

Monthly trending articles on ConnectClue

Janhavi posted in General Studies

Divya posted in Hotel Management

Deepanjali posted in Yoga

Rakesh posted in Entertainment

Raghu posted in Exploring India

Janhavi posted in Education

All

Blogs

Tutorials

Showing results for Amit ^Remove

How faceting works in Solr

The facet Parameter

If set to true, this parameter enables facet counts in the query response. If set to false, a blank or missing value, this parameter disables faceting. None of the other parameters listed below will have any effect unless this parameter is set to true. The default value is blank (false).

The facet.query Parameter

This parameter allows you to specify an arbitrary query in the Lucene default syntax to generate a facet count.

By default, Solr?s faceting feature automatically determines the unique terms for a field and returns a count for each of those terms. Using facet.query, you can override this default behavior and select exactly which terms or expressions you would like to see counted. In a typical implementation of faceting, you will specify a number of facet.query parameters. This parameter can be particularly useful for numeric-range-based facets or prefix-based facets.

You can set the facet.query parameter multiple times to indicate that multiple queries should be used as separate facet constraints.

To use facet queries in a syntax other than the default syntax, prefix the facet query with the name of the query notation. For example, to use the hypothetical myfunc query parser, you could set the facet.query parameter like so:

facet.query={!myfunc}name~fred

Field-Value Faceting Parameters

Several parameters can be used to trigger faceting based on the indexed terms in a field.

When using these parameters, it is important to remember that "term" is a very specific concept in Lucene: it relates to the literal field/value pairs that are indexed after any analysis occurs. For text fields that include stemming, lowercasing, or word splitting, the resulting terms may not be what you expect.

If you want Solr to perform both analysis (for searching) and faceting on the full literal strings, use the copyField directive in your Schema to create two versions of the field: one Text and one String.

The facet.field Parameter

The facet.field parameter identifies a field that should be treated as a facet. It iterates over each Term in the field and generate a facet count using that Term as the constraint. This parameter can be specified multiple times in a query to select multiple facet fields.

The facet.prefix Parameter

The facet.prefix parameter limits the terms on which to facet to those starting with the given string prefix. This does not limit the query in any way, only the facets that would be returned in response to the query.

This parameter can be specified on a per-field basis with the syntax of f..facet.prefix.

The facet.contains Parameter

The facet.contains parameter limits the terms on which to facet to those containing the given substring. This does not limit the query in any way, only the facets that would be returned in response to the query.

This parameter can be specified on a per-field basis with the syntax of f..facet.contains.

The facet.contains.ignoreCase Parameter

If facet.contains is used, the facet.contains.ignoreCase parameter causes case to be ignored when matching the given substring against candidate facet terms.

This parameter can be specified on a per-field basis with the syntax of f..facet.contains.ignoreCase.

The facet.sort Parameter

This parameter determines the ordering of the facet field constraints.

There are two options for this parameter.

count

Sort the constraints by count (highest count first).

index

Return the constraints sorted in their index order (lexicographic by indexed term). For terms in the ASCII range, this will be alphabetically sorted.

The default is count if facet.limit is greater than 0, otherwise, the default is index.

This parameter can be specified on a per-field basis with the syntax of f..facet.sort.

The facet.limit Parameter

This parameter specifies the maximum number of constraint counts (essentially, the number of facets for a field that are returned) that should be returned for the facet fields. A negative value means that Solr will return unlimited number of constraint counts.

The default value is 100.

This parameter can be specified on a per-field basis to apply a distinct limit to each field with the syntax of f..facet.limit.

The facet.offset Parameter

The facet.offset parameter indicates an offset into the list of constraints to allow paging.

The default value is 0.

This parameter can be specified on a per-field basis with the syntax of f..facet.offset.

The facet.mincount Parameter

The facet.mincount parameter specifies the minimum counts required for a facet field to be included in the response. If a field?s counts are below the minimum, the field?s facet is not returned.

The default value is 0.

This parameter can be specified on a per-field basis with the syntax of f..facet.mincount.

The facet.missing Parameter

If set to true, this parameter indicates that, in addition to the Term-based constraints of a facet field, a count of all results that match the query but which have no facet value for the field should be computed and returned in the response.

The default value is false.

This parameter can be specified on a per-field basis with the syntax of f..facet.missing.

The facet.method Parameter

The facet.method parameter selects the type of algorithm or method Solr should use when faceting a field.

The following methods are available.

enum

Enumerates all terms in a field, calculating the set intersection of documents that match the term with documents that match the query.

This method is recommended for faceting multi-valued fields that have only a few distinct values. The average number of values per document does not matter.

For example, faceting on a field with U.S. States such as Alabama, Alaska, ? Wyoming would lead to fifty cached filters which would be used over and over again. The filterCache should be large enough to hold all the cached filters.

Calculates facet counts by iterating over documents that match the query and summing the terms that appear in each document.

This is currently implemented using an UnInvertedField cache if the field either is multi-valued or is tokenized (according to FieldType.isTokened()). Each document is looked up in the cache to see what terms/values it contains, and a tally is incremented for each value.

This method is excellent for situations where the number of indexed values for the field is high, but the number of values per document is low. For multi-valued fields, a hybrid approach is used that uses term filters from the filterCache for terms that match many documents. The letters fc stand for field cache.

fcs

Per-segment field faceting for single-valued string fields. Enable with facet.method=fcs and control the number of threads used with the threads local parameter. This parameter allows faceting to be faster in the presence of rapid index changes.

The default value is fc (except for fields using the BoolField field type and when facet.exists=true is requested) since it tends to use less memory and is faster when a field has many unique terms in the index.

This parameter can be specified on a per-field basis with the syntax of f..facet.method.

The facet.enum.cache.minDf Parameter

This parameter indicates the minimum document frequency (the number of documents matching a term) for which the filterCache should be used when determining the constraint count for that term. This is only used with the facet.method=enum method of faceting.

A value greater than zero decreases the filterCache?s memory usage, but increases the time required for the query to be processed. If you are faceting on a field with a very large number of terms, and you wish to decrease memory usage, try setting this parameter to a value between 25 and 50, and run a few tests. Then, optimize the parameter setting as necessary.

The default value is 0, causing the filterCache to be used for all terms in the field.

This parameter can be specified on a per-field basis with the syntax of f..facet.enum.cache.minDf.

The facet.exists Parameter

To cap facet counts by 1, specify facet.exists=true. It can be used with facet.method=enum or when it?s omitted. It can be used only on non-trie fields (such as strings). It may speed up facet counting on large indices and/or high-cardinality facet values..

This parameter can be specified on a per-field basis with the syntax of f..facet.exists or via local parameter` facet.field={!facet.method=enum facet.exists=true}size`.

The facet.excludeTerms Parameter

If you want to remove terms from facet counts but keep them in the index, the facet.excludeTerms parameter allows you to do that.

Over-Request Parameters

In some situations, the accuracy in selecting the "top" constraints returned for a facet in a distributed Solr query can be improved by "Over Requesting" the number of desired constraints (ie: facet.limit) from each of the individual Shards. In these situations, each shard is by default asked for the top ?10 + (1.5 * facet.limit)? constraints.

In some situations, depending on how your docs are partitioned across your shards, and what facet.limit value you used, you may find it advantageous to increase or decrease the amount of over-requesting Solr does. This can be achieved by setting the facet.overrequest.count (defaults to 10) and facet.overrequest.ratio (defaults to 1.5) parameters.

The facet.threads Parameter

This param will cause loading the underlying fields used in faceting to be executed in parallel with the number of threads specified. Specify as facet.threads=N where N is the maximum number of threads used. Omitting this parameter or specifying the thread count as 0 will not spawn any threads, and only the main request thread will be used. Specifying a negative number of threads will create up to Integer.MAX_VALUE threads.

Amit posted in Apache Solr

Post updated on: Jun 4, 2025 8:15:54 PM

Apache Solr exercise : learn quickly with example

Follow below steps to start and configure solr quickly on your machine

1. Download solr from Solr Download For window you can download the zip file of latest version and for Unix/Linux download tgz file. For example I clicked on solr-8.6.2.zip It will take user to mirror page where all files are refereed Solr Zip I clicked on Solr Zip

2. Once downloaded unzip the zip file. Make sure you have Java 8 already installed in your machine.

3. Now you can start the solr server. Solr comes with embedded Jetty server in it. So we can directly start solr server. Solr index all data in particular Collection which is also called Core. Solr provides some collection which are already created for us. We can see the present collection in the folder --solr-8.6.2\example.
Whenever we restart solr then we need to add the name of the collection.

4. Go to folder location --\solr-8.6.2\bin and run the below command to start solr with techproducts collection

solr -e techproducts

You can see in the logs that it tries to index some of the already present XML files to solr for demo/practice

5. Once solr is started with the above message shown in screen shot then we can hit the below url to see the solr Admin UI page
http://localhost:8983/solr/#/

6. If you see the above screen shot then it means that solr is running and it's restarted successfully.

7. Now click on Core Selector from left side and you can see techproducts in dropdown. Select techproducts.

8. You can see all configuration for this collection techproducts.

9. Click on Query tab from left panel and then click on the "Execute Query" button

10. You can see the results for the data indexed in solr. You can try this query in separate browser to see the search result.
http://localhost:8983/solr/techproducts/select?q=*%3A*

11. For adding new item in solr, add new content in XML file under \solr-8.6.2\example\exampledocs OR create a new XML file we your new content like below.
I have created a new file myfile.xml and added below data in that file

12. Now index this new data in solr by below command.
Go to folder example/exampledocs and run this command. At this folder post.jsr is already present.

java -Dc=techproducts -jar post.jar myfile.xml

13. Now search for the new content which you have just indexed as http://localhost:8983/solr/techproducts/select?q=mymenu

14. You can shut down solr. Go to folder location --\solr-8.6.2\bin and run the below command

solr stop

Stay tuned for more updated and more exercise with steps and example.

Happy learning!!

Amit posted in Apache Solr

Post updated on: Oct 11, 2020 11:50:32 PM

Apache Solr tutorial

World is targeting to decrease the response time, irrespective of the application developed and technology used. Searching is a core functionality being used in eCommerce as well in other application. Keeping the response time and scalability in high priority, Apache Lucene project introduced SOLR SEARCH ENGINE.

Benefits

Solr is the popular, blazing fast open source enterprise search platform from the Apache Lucene project. Its major features are: powerful full-text search, hit highlighting, faceted search, dynamic clustering, database integration, rich document (e.g., Word, PDF) handling, and geospatial search. Solr is highly scalable, providing distributed search. Solr is written in Java and runs as a standalone full-text search server within a servlet container such as Tomcat. Solr uses the Lucene Java search library at its core for full-text indexing and search, and has REST-like HTTP/XML and JSON APIs that make it easy to use from virtually any programming language. Solr's powerful external configuration allows it to be tailored to almost any type of application without Java coding, and it has extensive plug-in architecture when more advanced customization is required.

Lucene uses 3 to 5 times less memory for when dealing with terms dictionary, so it's even less RAM consuming. So it provides quick search.
Faceted Searching.
Light weight application.

Basic Work Flow

Schema.xml: This file is responsible to declare all fields of data feed with their type and scope for search. It first declare the data type which can be used for fields. It defines the field types and fields of documents. It can drive more intelligent processing Dynamic Fields enables on-the-fly addition of new fields in this field. CopyField functionality allows indexing a single field multiple ways, or combining multiple fields into a single searchable field. Explicit types eliminates the need for guessing types of fields. We can also provide customized data type . In that case we are required to create concerned class and need to point that class in the declaration of the data type .
It makes possible to specify that a field is a String, int, float, or other primitive, or a custom type.

It has a feature of Dynamic field definitions. If a field name is not found, dynamicFields will be used if the name matches any of the patterns. e.g. name="*_i" will match any field ending in _i (like myid_i, z_i). Longer patterns will be matched first. If equal size patterns are found then the first appearing in the schema will be used.

Solrconfig.xml: On the configuration side, the solrconfig.xml file specifies how Solr should handle indexing, highlighting, faceting, search, and other requests, as well as attributes specifying how caching should be handled and how Lucene should manage the index. This is defined in solrconfig.xml. The configuration can depend on the schema, but the schema never depends on the configuration. Cache used to hold field values that are quickly accessible by document id. The fieldValueCache is created by default even if not configured here. Response is given in XML/JSON format as per user corresponding parameters.

Detailed Description

Solr Faceted Search Implementation:

Faceted search gives the facility to dig our searching upto bottom level. In the respect of user this helps to search from category to sub category and many more. Each facet displayed also shows the number of hits within the search that match that category. Users can then "drill down" by applying specific contstraints to the search results. Faceted search is also called faceted browsing, faceted navigation, guided navigation and sometimes parametric search.
It's relatively simple to get faceting information from Solr, as there are few prerequisites. Faceting commands are added to any normal Solr query request, and the faceting counts come back in the same query response. Solr offers the following types of faceting, all of which can be requested with no prior configuration:
Field faceting - retrieve the counts for all terms, or just the top terms in any given field. The field must be indexed.
Query faceting - return the number of documents in the current search results that also match the given query.
Date faceting - return the number of documents that fall within certain date ranges.

Caching

Solr caches are associated with an Index Searcher ? a particular 'view' of the index that doesn't change. So as long as that Index Searcher is being used, any items in the cache will be valid and available for reuse. Caching in Solr is unlike ordinary caches in that Solr cached objects will not expire after a certain period of time; rather, cached objects will be valid as long as the Index Searcher is valid.
The current Index Searcher serves requests and when a new searcher is opened, the new one is auto-warmed while the current one is still serving external requests. When the new one is ready, it will be registered as the current searcher and will handle any new search requests. The old searcher will be closed after all request it was servicing finish. The current Searcher is used as the source of auto-warming. When a new searcher is opened, its caches may be prepopulated or "autowarmed" using data from caches in the old searcher.

Required attributes in caching.

Available SolrCache Class for implementation:
solr.LRUCache
solr.FastLRUCache
solr.LFUCache

initialSize
The initial capacity (number of entries) of the cache.

autowarmCount
It is responsible to get the searched data from old cache instead of hitting solr.It defines the size or how many data can be cached. When a new searcher is opened, configurable searches are run against it in order to warm it up to avoid slow first hits. During warming, the current searcher handles live requests.When a new searcher is opened, its caches may be prepopulated or "autowarmed" with cached object from caches in the old searcher. autowarmCount is the number of cached items that will be regenerated in the new searcher. You will proably want to base the autowarmCount setting on how long it takes to autowarm. You must consider the trade-off ? time-to-autowarm versus how warm (i.e., autowarmCount) you want the cache to be. The autowarm parameter is set for the caches in solrconfig.xml.

This is Cache warming in background

      class="solr.LRUCache"
      size="512"
      initialSize="512"
      autowarmCount="0"/>

Caches configuration is view on static page of solr for a running Solr server.
Url for static page is http://localhost:8983/solr/admin/stats.jsp

The most recently accessed items in the caches of the current searcher are re-populated in the new searcher; enabling high cache hit rates across index/searcher changes. This is autowarming in background.

Query Creation

Query created to hit Solr engine follows HTTP interface with configurable response formats (XML/XSLT, JSON, Python, Ruby, PHP, Velocity, binary).Sorting can be done by any number of fields, and by complex functions of numeric fields
Highlighted context snippets can be used.
Faceted Searching based on unique field values, explicit queries, date ranges, and numeric ranges is key feature of Solr. Spelling check and auto suggestions are also provided for user queries, in that case it is required to manipulate solrConfig.xml for spelling check.
Queries are creating using simple java code but one can use solr API apache-solr-solrj-1.4.1.jar. Inbuilt class SolrQuery is responsible to create queries. Its different methods can be used to append different parameters in query. The above API can be used for parsing response data also or one can write own code for parsing/handling response data.
Query Example:-

Basic Parameters of Solr Query:

q: This is the basic parameter in solr query which contains the searched content.
Start: This parameter is used to paginate results from a query.
Rows: This parameter is used to paginate results from a query.
Fq: This parameters contains the condition applied on the search query.
Fl: This parameter can be used to specify a set of fields to return, limiting the amount of information in the response.
Sort: This parameter says that we need the response in sorted form asc/desc on a particular field basis.
Facet: It defines weather facet search is true/false on the particular call.
Wt: The value of this parameter defines the format of solr response.

Amit posted in Apache Solr

Post updated on: Oct 1, 2020 1:51:34 AM

Installing Apache Web server on Linux server

Apache http web server is open source. In this tutorial we will get to know the steps to install this in Ubuntu Server.
This steps are tested on Ubuntu 16.04. I have listed down some basic Linux commands also which you should be aware of.

1. Since this is Ubuntu so you need to update it before all types of installation

-sudo apt update

2. Install Apache server by running the below command.
- sudo apt install apache2

3. Once this is installed, try the below url to check if web server is running with default page without any issue. You can type your IP address instead of localhost.
- localhost:80/

4. If you see the above screen shot in your response then Apache web server is installed successfully. This is serving the default page created during installing from below location.
- cd /var/www/html

5. You can try to open the index.html file present here and update, same will reflect here localhost:80/
    Open and Edit file
   - vim index.html
    Type "i" to enter insert mode.
    Press ESC and then type ":wq!" then press Enter to save the changes and exit vim.

6. Now we will learn how to create a placeholder for any new site hosting. Suppose your site name is myexample

cd /var/www/myexample
nano index.html

Add any content in this file which you wants, I added below content.

7. Now let's configure the new site in default location of apache.
cd /etc/apache2/sites-available/

9. Create config for our domain
    sudo nano myexample.conf
    Add the email address in ServerAdmin email@myexample.com
    Add the server name as ServerName myexample.com
    Add the Document for the folder which you had already created