Fuzzy search and definitions?

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

Fuzzy search and definitions?

David Haslam
For those front-end apps that have a fuzzy search feature, is this based on
https://en.wikipedia.org/wiki/Levenshtein_distance 
or on something quite different?

Any chance of building this into SWORD & JSword, or do we already have it?

I've never seen this mentioned before in the nine years I've been a CrossWire volunteer.

Even the word "fuzzy" only occurs on three of the pages in our developers' wiki.

Best regards,

David
Reply | Threaded
Open this post in threaded view
|

Re: Fuzzy search and definitions?

David Haslam
Lucene already provides for "fuzzy" searches.

Try searching the KJV for: melchesidec~

The tilde on the end indicates to the Lucene search engine to do a fuzzy search.

The above example will find Melchizedek in Gen 14:8, Psalm 110:4 and  the 9 instances of Melchisedec in Hebrews, though it also gives Malchielites in Numbers 26:45 !

HT to DM Smith for an old post in this list.

David
Reply | Threaded
Open this post in threaded view
|

Re: Fuzzy search and definitions?

David Haslam
Assuming a front-end has a switch for fuzzy search....

What would be neater would be for the front-end to append the tilde in the background when fuzzy search is ON, while still displaying the search key without the tilde in the results header.

This would obviate expecting the average user to know the detail of this "under the hood" aspect of Lucene.

David
Reply | Threaded
Open this post in threaded view
|

Re: Fuzzy search and definitions?

David Haslam
Meanwhile, perhaps the Xiphos developer could add a point about the fuzzy search tilde to the popup dialog text for Lucene search syntax ?

Currently this reads:

Syntax overview for optimized "lucene" searches
Search for verses that contain...

loved one
         "loved" or "one"
        This is the same as searching for loved OR one

"loved one"
        The phrase "loved one"

love*
        A word starting with "love"
        (love OR loves OR loved OR etc...)

loved AND one
        The word "loved" and the word "one"
        && can be used in place of AND

+loved one
        Verses that must contain "loved" and may contain "one"

loved NOT one
        "loved" but not "one"

(loved one) AND God
        "loved" or "one" and "God"

lemma:G2316
        Search for the Strong's Greek ("G") word number 2316.
        Also, select Strong's display on the Attribute Search tab.

For complete details, search the web for "lucene search syntax".

Best regards,

David
Reply | Threaded
Open this post in threaded view
|

Re: Fuzzy search and definitions?

David Haslam
In reply to this post by David Haslam
Doing it "under the hood" isn't such a good idea after all.

What if you want to fuzzy match two or more words?

Try this for example:  tilgath~ AND paliser~

David
Reply | Threaded
Open this post in threaded view
|

Re: Fuzzy search and definitions?

Donna Whisnant
For my King James Pure Bible Search (KJPBS) software (http://www.purebiblesearch.com/), I use the SoundEx algorithm for fuzzy search.  Though, in my software, it doesn't really have do "searching", per se, as it already knows every word in the text in all of its forms and so the "search" is really nothing more than a word filter that filters what's in the drop list.  So it's doing "real time live searching" of the text as you type.

If you enable SoundEx in the configuration options of KJPBS and you enter "btr", for example, you'll see things like "better", "butter", "bitter", "betray", etc, in the drop list.

Though I also accept wildcards too, like "*" and "?", even without SoundEx enable.

Regards,
Donna
Reply | Threaded
Open this post in threaded view
|

Re: Fuzzy search and definitions?

David Haslam
In reply to this post by David Haslam
By a process of trial and error, I have found that words with an edit distance of 3 or less are found, and 4 or more are not found.

Try tilgath~ AND palonesar~

and play with the search key in Xiphos advanced search with Lucene selected.

David
Reply | Threaded
Open this post in threaded view
|

Re: Fuzzy search and definitions?

Troy A. Griffitts
You might not be getting hits if all four words aren't in the same verse. Try prefixing to your search term:

prox:

That should span verses.


On May 2, 2017 1:20:36 PM MST, David Haslam <[hidden email]> wrote:
By a process of trial and error, I have found that words with an *edit
distance* of 3 or less are found, and 4 or more are not found.

Try tilgath~ AND palonesar~

and play with the search key in Xiphos advanced search with Lucene selected.

David



--
View this message in context: http://sword-dev.350566.n4.nabble.com/Fuzzy-search-and-definitions-tp4657137p4657143.html
Sent from the SWORD Dev mailing list archive at Nabble.com.



sword-devel mailing list: [hidden email]
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page

--
Sent from my Android device with K-9 Mail. Please excuse my brevity.
_______________________________________________
sword-devel mailing list: [hidden email]
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page