RangeFacetHandler for Date Ranges

Oct 16, 2012 at 2:00 AM

I'm struggling to get a facet for date ranges to work. I've posted on the original Java BoboBrowse group page, just in case it's a core issue, and not related to the .Net port...

I've tried both Field and NumericField with no success. I am adding a 6th facet to an existing working Index that uses BoboBrowse.Net for facet support. So, far it hasn't taken too much time to learn how to apply BBN and its a great effort by the author.

I've gotten two other fields (NumericField) to work with BBN. Not sure why a date range is really that much different.

It looks like:

new RangeFacetHandler(SearchFields.ModifyDate, new PredefinedTermListFactory<DateTime>(ModifiedDate.CUSTOM_DATE_FORMAT), ModifiedDate.Ranges.Select(i => i.Name).ToList()), 

where

public const string CUSTOM_DATE_FORMAT = "yyyyMMdd";

and the field is defined as a regular field, not NumericField. It is indexed as:

                document.Add(CreateOrSetValue(fields, SearchFields.ModifyDate, modify.ToString(ModifiedDate.CUSTOM_DATE_FORMAT, CultureInfo.InvariantCulture), Field.Store.YES,
                                              Field.Index.NOT_ANALYZED));

and my predefined ranges are computed on the fly, based on the current date:

        public static IEnumerable<ModifiedDate> Ranges
        {
            get
            {
                var today = DateTime.Today.ToString(CUSTOM_DATE_FORMAT, CultureInfo.InvariantCulture);
                yield return Create(today, today, "Today");
                var yesterday = FromDays(1);
                yield return Create(yesterday, yesterday, "Yesterday");
                yield return Create(yesterday, FromDays(7), "Last Week");
                yield return Create(yesterday, FromDays(30), "Last Month");
                yield return Create(yesterday, FromDays(365), "Last Year");
            }
        }

  Any ideas? I've read the Javadoc for NumericField and it seems to suggest that you get better performance when using NumericFields to represent date ranges. I tried that and didn't achieve success.

 

So far, the facet group coming after the query completes has been empty for that facet.

Coordinator
Oct 18, 2012 at 10:28 AM

Hello,

Right now we are indexing dates with following code:

           

 document.Add(new Field(name, DateTools.DateToString(value, DateTools.Resolution.MINUTE), Field.Store.YES, Field.Index.NOT_ANALYZED));

This way Bobo should be able to build auto-range facet. Also for the facet values we are using following format string:

FormatString = "MM/dd/yyyy"

And you can use * for start/end values to designate open interval.

If it doesn't work with explicit ranges, I would try to build auto-range and see what Bobo will generate in this case.

Let me know if it still doesn't work.

 

Regards,

Alexey

Oct 18, 2012 at 9:14 PM

Thanks for the Alexey... Following your steps I was able to get auto ranges to generate.

This is how I set up my field:

 

document.Add(CreateOrSetValue(fields, SearchFields.ModifyDate, DateTools.DateToString(modify, DateTools.Resolution.DAY), Field.Store.YES, Field.Index.NOT_ANALYZED));

 

And this is how I created my facet handler:

 

new RangeFacetHandler(SearchFields.ModifyDate, new PredefinedTermListFactory<DateTime>("MM/dd/yyyy"), true)

 

I got three facets produced for my data, but I don't think they are very useful, when compared to the "Last Touched Yesterday", "Last Touched Last Week", "Last Touched This Month", etc.

This is how I attempt to setup my facet handler for explicit ranges:

 

new RangeFacetHandler(SearchFields.ModifyDate, new PredefinedTermListFactory<DateTime>("MM/dd/yyyy"), ModifiedDate.Ranges.Select(i => i.Name).ToList())

 

Where ModifiedDate is:

    public class ModifiedDate
    {
        public static IEnumerable<ModifiedDate> Ranges
        {
            get
            {
                var today = DateTools.DateToString(DateTime.Today, DateTools.Resolution.DAY);
                yield return Create(today, today, "Today");
                var yesterday = FromDays(1);
                yield return Create(yesterday, yesterday, "Yesterday");
                yield return Create(yesterday, FromDays(7), "Last Week");
                yield return Create(yesterday, FromDays(30), "Last Month");
                yield return Create(yesterday, FromDays(365), "Last Year");
            }
        }

        private static ModifiedDate Create(string from, string to, string displayName)
        {
            return new ModifiedDate(string.Format(Format, from, to), displayName);
        }

        private static string FromDays(int fromDays)
        {
            return DateTools.DateToString(DateTime.Today.Subtract(TimeSpan.FromDays(fromDays)), DateTools.Resolution.DAY);
        }

        private static string Format
        {
            get { return "[{0} TO {1}]"; }
        }

        public string Name { get; set; }
        public string DisplayName { get; set; }

        public ModifiedDate(string name, string displayName)
        {
            Name = name;
            DisplayName = displayName;
        }
    }
  

But I'm not getting generated ranges from this list. I think perhaps it's because they aren't truly facets in the sense they are not mutually exclusive (Last Month subsumes Last Week subsumes Yesterday, etc). Maybe I don't even want to use facets in my search UI for this field and just create a custom UI (albiet, without the Scented UI benefit) that has these explicit ranges baked in...

 

Rob

Oct 18, 2012 at 9:44 PM

Figured it out. The ranges needed to be from Low TO High date, not High TO Low. When I constructed ModifiedDate class above, I conceptualized the ranges in a fashion inverted from what Bobo was expecting... Should be 

yield return Create(FromDays(365), yesterday, "Last Year");

 

<face palm>

Oct 18, 2012 at 9:46 PM

Alexey,

 

Do you know what Bobobrowse generally does for not-exclusive (range) facets? 

Thanks

Coordinator
Oct 19, 2012 at 1:05 PM

Rob,

It should be working with overlapping ranges too, just try to provide such ranges manually. If it doesn't work, this is clearly a bug.  Let me know if this is the case. 

Regards,

Alexey

Nov 14, 2014 at 7:37 AM
Edited Nov 14, 2014 at 7:55 AM
roblcecil wrote:
Thanks for the Alexey... Following your steps I was able to get auto ranges to generate. This is how I set up my field:   document.Add(CreateOrSetValue(fields, SearchFields.ModifyDate, DateTools.DateToString(modify, DateTools.Resolution.DAY), Field.Store.YES, Field.Index.NOT_ANALYZED));   And this is how I created my facet handler:   new RangeFacetHandler(SearchFields.ModifyDate, new PredefinedTermListFactory<DateTime>("MM/dd/yyyy"), true)   I got three facets produced for my data, but I don't think they are very useful, when compared to the "Last Touched Yesterday", "Last Touched Last Week", "Last Touched This Month", etc. This is how I attempt to setup my facet handler for explicit ranges:   new RangeFacetHandler(SearchFields.ModifyDate, new PredefinedTermListFactory<DateTime>("MM/dd/yyyy"), ModifiedDate.Ranges.Select(i => i.Name).ToList())   Where ModifiedDate is: public class ModifiedDate { public static IEnumerable<ModifiedDate> Ranges { get { var today = DateTools.DateToString(DateTime.Today, DateTools.Resolution.DAY); yield return Create(today, today, "Today"); var yesterday = FromDays(1); yield return Create(yesterday, yesterday, "Yesterday"); yield return Create(yesterday, FromDays(7), "Last Week"); yield return Create(yesterday, FromDays(30), "Last Month"); yield return Create(yesterday, FromDays(365), "Last Year"); } } private static ModifiedDate Create(string from, string to, string displayName) { return new ModifiedDate(string.Format(Format, from, to), displayName); } private static string FromDays(int fromDays) { return DateTools.DateToString(DateTime.Today.Subtract(TimeSpan.FromDays(fromDays)), DateTools.Resolution.DAY); } private static string Format { get { return "[{0} TO {1}]"; } } public string Name { get; set; } public string DisplayName { get; set; } public ModifiedDate(string name, string displayName) { Name = name; DisplayName = displayName; } }    But I'm not getting generated ranges from this list. I think perhaps it's because they aren't truly facets in the sense they are not mutually exclusive (Last Month subsumes Last Week subsumes Yesterday, etc). Maybe I don't even want to use facets in my search UI for this field and just create a custom UI (albiet, without the Scented UI benefit) that has these explicit ranges baked in...   Rob
I'm trying to do exactly as you wrote, that is:
doc.Add(new Field("date", DateTools.DateToString(date, DateTools.Resolution.DAY), Field.Store.YES, Field.Index.NOT_ANALYZED));
and
 FacetHandler dateFacetHandler = new RangeFacetHandler(date, new PredefinedTermListFactory<DateTime>("MM/dd/yyyy"), true);
but the result obtained for a simple index is:

Facets for date:
where the format "MM/dd/yyyy" is not applied.
Both the project and the library are compiled against the 4.0 .net framework.
Could you help me?

Thanks,
Diego

PS: is this project still alive?
Coordinator
Nov 17, 2014 at 1:59 PM
Not sure why the format string is not applied.

Project is semi-alive. We use it intensively in our core product, but we haven't done any major development of the library itself in a while.
Nov 17, 2014 at 2:22 PM
@Shcherbachev

The project would probably have a lot more activity (including contributions) if it were up on GitHub and available for download on NuGet. More downloads == more activity. I created a package script of the BoboBrowse.Net port to 3.0.3 and he has recently merged the pull request. However, I am unable to use that copy because it has some problems.

Unfortunately, I am stuck with this version and therefore, stuck with the version of Lucene.NET that this is based on.
Coordinator
Nov 19, 2014 at 4:40 PM
@NightOwl888 Maybe. But we got so used to Mercurial and won't switch to Git just for that. So keeping everything as is for now.
Nov 19, 2014 at 7:21 PM
Actually, you don't need to switch from Mercurial to use GitHub. In fact, you don't even have to move your main repo from codeplex. You can create a Git mirror of your Mercurial repo on GitHub. That way contributors can take their pick whether to use Mercurial or Git to submit their changes - it looks pretty simple to merge a pull request from GitHub back to Mercurial as well.

It would be nice to get zhengchuen's version of BoboBrowse.Net put into a branch here so a pre-release based on 3.0.3 could be published on NuGet and hopefully the kinks can be worked out by the community. I added a build script, but I could also help you complete the task of publishing it on NuGet. Perhaps some parallel development can also be accomplished to get it working with the new 4.8.0 version of Lucene.Net, which sounds like it should have a pre-release coming soon.

Given the fact that the next version of Lucene.Net is on GitHub, and even Microsoft has favored it over codeplex for the recent open-sourcing of the core .NET platform, stating that they did it to increase community involvement, it looks like GitHub is becoming the most important platform for open-source development.
Nov 21, 2014 at 1:21 PM
Edited Nov 21, 2014 at 1:21 PM
@NightOwl888
How do you know that a pre-release of the 4.8.0 version of Lucene.Net will be available soon? As you can see from https://issues.apache.org/jira/browse/LUCENENET/?selectedTab=com.atlassian.jira.jira-projects-plugin:roadmap-panel the next release for Lucene.Net is the 3.6 and it is not so close. However, it is my understanding that from the next release of Lunece.Net a proper faceted search will be implemented. Is that correct?
Coordinator
Nov 21, 2014 at 1:59 PM
I am following their mail list. They are working on 4.8.0 and skipping 3.6 altogether.
Nov 21, 2014 at 7:56 PM
@dtosato

Here is a copy of the email that was sent out on the Lucene.Net mailing list on 2014/11/14:
Hi all,

Just a quick recap of the last couple of weeks.

We have started the process of working towards Lucene.NET 4.8, which will conform to Java Lucene 4.8.0 (with a quick patch later to bring it up to speed with 4.8.1).

The changes from the current Lucene.NET version (which is 3.0.3) are huge.
The project structure is different, the capabilities and features are new.
It's practically a different project.

As such, we are trying to revamp our process as well, to be more agile and trying to heal the community and project while at it. We have a private thread going on to think of ways we can do that, one of them is to reinstate the PMC, and more is coming.

The new sources are here for you to experiment with:
https://github.com/apache/lucene.net

The Core project is almost entirely ported including all tests, and we have most of the tests green. We have more sub-projects (like Facet, Queries, Analysis, Codecs and more) in the middle of the porting process and should have them out of the door soon as well.

Feel free to try it all out - we would love to have more hands on deck for testing and fixing the last remaining tests!

We are also working on a CI pipeline, on a TeamCity instance provided by CodeBetter and JetBrains here:
teamcity.codebetter.com/project.html?projectId=LuceneNet&tab=projectOverview

We are very interested in hearing from you, and hope to see interest and participation grow as we move ahead. We will be sharing more as more happens!

Cheers,

Itamar Syn-Hershko

The Simple Faceted Search has been a part of Lucene.Net (in the contrib namespace) for some time. However, I was unsuccessful in finding a way to make the facets act like filters the way I want. With BoboBrowse.Net, this is really straightforward. I am not sure, but I think that BoboBrowse has more functionality than SimpleFacetedSearch.
Nov 25, 2014 at 9:55 AM
Edited Nov 25, 2014 at 9:58 AM
The Simple Faceted Search has been a part of Lucene.Net (in the contrib namespace) for some time. However, I was unsuccessful in finding a way to make the facets act like filters the way I want. With BoboBrowse.Net, this is really straightforward. I am not sure, but I think that BoboBrowse has more functionality than SimpleFacetedSearch.
Yes, it is correct, bobo is much more prowerful than the SimpleFacetedSearch.
Using ILMerge, I built this 'merged' dll to use bobo together with Lucene 3.0.3 into the same assembly, but it does not work. In fact I get errors like this:
Error 23 Argument 1: cannot convert from 'Lucene.Net.Index.IndexReader [<myPath1>\Lucene.Net.3.0.3\lib\NET40\Lucene.Net.dll]' to 'Lucene.Net.Index.IndexReader [<myPath2>BoboBrowse.Merge.Net.dll]'.
So, what does exactly mean that the current version of BoboBrowse.Net is compatible with Lucene.Net 3.0.3?
Coordinator
Nov 25, 2014 at 10:11 AM
Which version of BoboBrowse.net you merged with Lucene 3.0.3? If you merged the version published here on Codeplex then it is not expected yet to work with 3.0.3. Check the zhengchuen's version for 3.0.3 compatibility instead.
Nov 25, 2014 at 12:56 PM
Edited Nov 25, 2014 at 2:31 PM
Shcherbachev wrote:
Which version of BoboBrowse.net you merged with Lucene 3.0.3? If you merged the version published here on Codeplex then it is not expected yet to work with 3.0.3. Check the zhengchuen's version for 3.0.3 compatibility instead.
The version published here on Codeplex, but the Lucene.Net 3.0.3 IndexReader cannot be used by bobo.
Nov 25, 2014 at 4:44 PM
dtosato wrote:
The version published here on Codeplex. I saw zhengchuen's version, but it has some problems as you know.
Actually, zhengchuen has made some progress on that front. He now has it working correctly for my use case - that is for a filtering faceted search - with this patch.

However, I added the unit tests from the codeplex version here, and 2 of them are failing (because of the same IndexOutOfRange exception in InternalBrowseHitCollector). I took a stab at fixing it by reverting back to the version of that class here, but it has a lot of dependencies that don't even exist there, which in turn have dependencies that also don't exist there. As far as I can tell, it does not have all of the features that this one does, but without having any idea about what java version of bobo browse it is supposed to be based on I can't be sure if these are missing features or are features that are now part of Lucene that no longer need to be implemented in bobo. One thing I am certain of is that it doesn't resemble this version very much and doesn't seem to resemble the java versions very much.

That is why I am hoping to get some cooperation from Shcherbachev to see if we can get a pre-release up on NuGet. Faceted search is now working. With enough attention, the other features can probably be fixed/fully implemented.

Getting the Lucene.Net 3.0.3 version on NuGet is not much work now that the build script is working. It can most likely be done in a single afternoon.

The primary thing that is holding this up is getting it into a single repo so the build process can label that repo each time a version is made. I would prefer to keep it all right here with the existing documentation, issues, and discussions. Shcerbachev doesn't want to switch from Mercurial, which is fine - we just need to get a mirror of this repo going on GitHub so the changes that are getting done in Git can be put here without too much trouble.

@Shcerbachev

If you don't have the time to commit to this, just give me write access to this repo. I will backup the repo, set up a GitHub mirror, move the existing code to a new branch, rebase zhengchuen's version onto the default branch, and setup the MyGet, NuGet, and SymbolSource accounts so new versions can be built, versioned, and labeled with a button click and pushed to NuGet with another button click, and then document the release procedure. I could do all this without your help with a different repo, but I think it would be much better (and less confusing) for the community if everything were right here.

Although that sounds like a lot, the most time consuming part of it will probably be writing the release procedure document. If there are no objections, I think I will just make the version number 3.0.3.[build number] since this this version supports Lucene.Net 3.0.3, sematic versioning doesn't really apply because it is a port, and I have no idea what version of bobobrowse it is supposed to be based on.

The only steps that you need to do are:
  1. Give me permission to write to this repo (NightOwl888)
  2. Setup a free MyGet account and send me your UserName there so I can set up your permissions to build and push the NuGet package
If you would prefer to own the NuGet and MyGet feeds, you could instead do:
  1. Give me permission to write to this repo (NightOwl888)
  2. Setup a free MyGet account with your user name
  3. Setup a free NuGet account with your user name
  4. Setup a free SymbolSource account with your user name
  5. Copy the API key from the MyGet account and NuGet account and register them with the SymbolSource account. Note that both of them can be "NuGet" type.
  6. Create a BoboBrowse_Net public feed in your NuGet account.
  7. Copy the API key from the NuGet account and add it to the NuGet.org package source of BoboBrowse_Net.
  8. In MyGet, give me permission to the feed (NightOwl888) with the minimum level of "Can manage users and all packages for this feed".
  9. In MyGet, add a build source to this repo.
  10. Notify me when complete.
Either way, it will probably take you less than an hour.
Nov 26, 2014 at 9:43 AM
NightOwl888 wrote:
Actually, zhengchuen has made some progress on that front. He now has it working correctly for my use case - that is for a filtering faceted search - with this patch.
Cool! I am trying the porting. I will report some bugs I found in github.