[Bug 18969] New: Elasticsearch - _all field is deprecated - should use copy_to to prepare for ES6

classic Classic list List threaded Threaded
39 messages Options
12
Reply | Threaded
Open this post in threaded view
|

[Bug 18969] New: Elasticsearch - _all field is deprecated - should use copy_to to prepare for ES6

bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=18969

            Bug ID: 18969
           Summary: Elasticsearch - _all field is deprecated - should use
                    copy_to to prepare for ES6
 Change sponsored?: ---
           Product: Koha
           Version: master
          Hardware: All
                OS: All
            Status: NEW
          Severity: normal
          Priority: P3
         Component: Searching
          Assignee: [hidden email]
          Reporter: [hidden email]
        QA Contact: [hidden email]

Reindex on a clean ES, you get a warning:
[DEPRECATION] 299 Elasticsearch-5.4.1-2cfe0df "field [include_in_all] is
deprecated, as [_all] is deprecated, and will be disallowed in 6.0, use
[copy_to] instead."

--
You are receiving this mail because:
You are watching all bug changes.
_______________________________________________
Koha-bugs mailing list
[hidden email]
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/
Reply | Threaded
Open this post in threaded view
|

[Bug 18969] Elasticsearch - _all field is deprecated - should use copy_to to prepare for ES6

bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=18969

Jonathan Druart <[hidden email]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         QA Contact|[hidden email]-communit |
                   |y.org                       |
          Component|Searching                   |Searching - Elasticsearch
                 CC|                            |[hidden email]-c
                   |                            |ommunity.org

--
You are receiving this mail because:
You are watching all bug changes.
_______________________________________________
Koha-bugs mailing list
[hidden email]
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/
Reply | Threaded
Open this post in threaded view
|

[Bug 18969] Elasticsearch - _all field is deprecated - should use copy_to to prepare for ES6

bugzilla-daemon
In reply to this post by bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=18969

Frank Hansen <[hidden email]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |[hidden email]

--
You are receiving this mail because:
You are watching all bug changes.
_______________________________________________
Koha-bugs mailing list
[hidden email]
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/
Reply | Threaded
Open this post in threaded view
|

[Bug 18969] Elasticsearch - _all field is deprecated - should use copy_to to prepare for ES6

bugzilla-daemon
In reply to this post by bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=18969

[hidden email] <[hidden email]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Blocks|                            |20196


Referenced Bugs:

https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=20196
[Bug 20196] [Omnibus] Prepare Koha to ElasticSearch6 - ES6
--
You are receiving this mail because:
You are watching all bug changes.
_______________________________________________
Koha-bugs mailing list
[hidden email]
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/
Reply | Threaded
Open this post in threaded view
|

[Bug 18969] Elasticsearch - _all field is deprecated - should use copy_to to prepare for ES6

bugzilla-daemon
In reply to this post by bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=18969

--- Comment #1 from Alex Arnaud <[hidden email]> ---
Created attachment 71607
  -->
https://bugs.koha-community.org/bugzilla3/attachment.cgi?id=71607&action=edit
Bug 18969: ES6 - remove use of field [include_in_all] (WIP for biblios only)

Test plan:

  1) apply this patch,
  2) update your elasticsearch server to the version 6 (6.2?),
  3) reinstall icu plugin,
  4) reindex your authoroties and biblios,
  5) check that there is no error in
    /var/log/elasticsearch/elasticsearch.log,
  6) try a search on biblios
  7) check that facet work

--
You are receiving this mail because:
You are watching all bug changes.
_______________________________________________
Koha-bugs mailing list
[hidden email]
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/
Reply | Threaded
Open this post in threaded view
|

[Bug 18969] Elasticsearch - _all field is deprecated - should use copy_to to prepare for ES6

bugzilla-daemon
In reply to this post by bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=18969

Alex Arnaud <[hidden email]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
  Attachment #71607|0                           |1
        is obsolete|                            |

--- Comment #2 from Alex Arnaud <[hidden email]> ---
Created attachment 71618
  -->
https://bugs.koha-community.org/bugzilla3/attachment.cgi?id=71618&action=edit
Bug 18969: ES6 - replace use of field include_in_all by copy_to

Test plan:

  1) apply this patch,
  2) update your elasticsearch server to the version 6 (6.2?),
  3) reinstall icu plugin,
  4) reindex your authorities and biblios,
  5) check that there is no error in
    /var/log/elasticsearch/elasticsearch.log,
  6) try a search on biblios,
  7) check that facet work,
  8) try a search on authorities

--
You are receiving this mail because:
You are watching all bug changes.
_______________________________________________
Koha-bugs mailing list
[hidden email]
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/
Reply | Threaded
Open this post in threaded view
|

[Bug 18969] Elasticsearch - _all field is deprecated - should use copy_to to prepare for ES6

bugzilla-daemon
In reply to this post by bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=18969

Alex Arnaud <[hidden email]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |[hidden email]
             Status|NEW                         |Needs Signoff

--
You are receiving this mail because:
You are watching all bug changes.
_______________________________________________
Koha-bugs mailing list
[hidden email]
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/
Reply | Threaded
Open this post in threaded view
|

[Bug 18969] Elasticsearch - _all field is deprecated - should use copy_to to prepare for ES6

bugzilla-daemon
In reply to this post by bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=18969

Alex Arnaud <[hidden email]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
  Attachment #71618|0                           |1
        is obsolete|                            |

--- Comment #3 from Alex Arnaud <[hidden email]> ---
Created attachment 71619
  -->
https://bugs.koha-community.org/bugzilla3/attachment.cgi?id=71619&action=edit
Bug 18969: ES6 - replace use of field include_in_all by copy_to

Test plan:

  1) apply this patch,
  2) update your elasticsearch server to the version 6 (6.2?),
  3) reinstall icu plugin,
  4) reindex your authorities and biblios,
  5) check that there is no error in
    /var/log/elasticsearch/elasticsearch.log,
  6) try a search on biblios,
  7) check that facet work,
  8) try a search on authorities

--
You are receiving this mail because:
You are watching all bug changes.
_______________________________________________
Koha-bugs mailing list
[hidden email]
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/
Reply | Threaded
Open this post in threaded view
|

[Bug 18969] Elasticsearch - _all field is deprecated - should use copy_to to prepare for ES6

bugzilla-daemon
In reply to this post by bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=18969

Tomás Cohen Arazi <[hidden email]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |[hidden email]

--- Comment #4 from Tomás Cohen Arazi <[hidden email]> ---
(In reply to Alex Arnaud from comment #3)

> Created attachment 71619 [details] [review]
> Bug 18969: ES6 - replace use of field include_in_all by copy_to
>
> Test plan:
>
>   1) apply this patch,
>   2) update your elasticsearch server to the version 6 (6.2?),
>   3) reinstall icu plugin,
>   4) reindex your authorities and biblios,
>   5) check that there is no error in
>     /var/log/elasticsearch/elasticsearch.log,
>   6) try a search on biblios,
>   7) check that facet work,
>   8) try a search on authorities

Is this backwards compatible?

--
You are receiving this mail because:
You are watching all bug changes.
_______________________________________________
Koha-bugs mailing list
[hidden email]
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/
Reply | Threaded
Open this post in threaded view
|

[Bug 18969] Elasticsearch - _all field is deprecated - should use copy_to to prepare for ES6

bugzilla-daemon
In reply to this post by bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=18969

[hidden email] <[hidden email]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |[hidden email]
                   |                            |om

--
You are receiving this mail because:
You are watching all bug changes.
_______________________________________________
Koha-bugs mailing list
[hidden email]
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/
Reply | Threaded
Open this post in threaded view
|

[Bug 18969] Elasticsearch - _all field is deprecated - should use copy_to to prepare for ES6

bugzilla-daemon
In reply to this post by bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=18969

--- Comment #5 from Alex Arnaud <[hidden email]> ---
(In reply to Tomás Cohen Arazi from comment #4)

> Is this backwards compatible?

Yep. I tested this patch with ES 5 (5.6.7):
  - Biblio search works (and facets)
  - Authority search work

--
You are receiving this mail because:
You are watching all bug changes.
_______________________________________________
Koha-bugs mailing list
[hidden email]
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/
Reply | Threaded
Open this post in threaded view
|

[Bug 18969] Elasticsearch - _all field is deprecated - should use copy_to to prepare for ES6

bugzilla-daemon
In reply to this post by bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=18969

Nick Clemens <[hidden email]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Assignee|[hidden email]   |[hidden email]

--
You are receiving this mail because:
You are watching all bug changes.
_______________________________________________
Koha-bugs mailing list
[hidden email]
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/
Reply | Threaded
Open this post in threaded view
|

[Bug 18969] Elasticsearch - _all field is deprecated - should use copy_to to prepare for ES6

bugzilla-daemon
In reply to this post by bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=18969

Nick Clemens <[hidden email]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|Needs Signoff               |Signed Off

--
You are receiving this mail because:
You are watching all bug changes.
_______________________________________________
Koha-bugs mailing list
[hidden email]
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/
Reply | Threaded
Open this post in threaded view
|

[Bug 18969] Elasticsearch - _all field is deprecated - should use copy_to to prepare for ES6

bugzilla-daemon
In reply to this post by bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=18969

Nick Clemens <[hidden email]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
  Attachment #71619|0                           |1
        is obsolete|                            |

--- Comment #6 from Nick Clemens <[hidden email]> ---
Created attachment 72654
  -->
https://bugs.koha-community.org/bugzilla3/attachment.cgi?id=72654&action=edit
Bug 18969: ES6 - replace use of field include_in_all by copy_to

Test plan:

  1) apply this patch,
  2) update your elasticsearch server to the version 6 (6.2?),
  3) reinstall icu plugin,
  4) reindex your authorities and biblios,
  5) check that there is no error in
    /var/log/elasticsearch/elasticsearch.log,
  6) try a search on biblios,
  7) check that facet work,
  8) try a search on authorities

Signed-off-by: Nick Clemens <[hidden email]>

--
You are receiving this mail because:
You are watching all bug changes.
_______________________________________________
Koha-bugs mailing list
[hidden email]
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/
Reply | Threaded
Open this post in threaded view
|

[Bug 18969] Elasticsearch - _all field is deprecated - should use copy_to to prepare for ES6

bugzilla-daemon
In reply to this post by bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=18969

Bouzid <[hidden email]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
  Attachment #72654|0                           |1
        is obsolete|                            |

--- Comment #7 from Bouzid <[hidden email]> ---
Created attachment 73498
  -->
https://bugs.koha-community.org/bugzilla3/attachment.cgi?id=73498&action=edit
Bug 18969: ES6 - replace use of field include_in_all by copy_to

Test plan:

  1) apply this patch,
  2) update your elasticsearch server to the version 6 (6.2?),
  3) reinstall icu plugin,
  4) reindex your authorities and biblios,
  5) check that there is no error in
    /var/log/elasticsearch/elasticsearch.log,
  6) try a search on biblios,
  7) check that facet work,
  8) try a search on authorities

Signed-off-by: Nick Clemens <[hidden email]>
Signed-off-by: Bouzid Fergani <[hidden email]>

--
You are receiving this mail because:
You are watching all bug changes.
_______________________________________________
Koha-bugs mailing list
[hidden email]
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/
Reply | Threaded
Open this post in threaded view
|

[Bug 18969] Elasticsearch - _all field is deprecated - should use copy_to to prepare for ES6

bugzilla-daemon
In reply to this post by bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=18969

Bouzid <[hidden email]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |[hidden email]

--
You are receiving this mail because:
You are watching all bug changes.
_______________________________________________
Koha-bugs mailing list
[hidden email]
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/
Reply | Threaded
Open this post in threaded view
|

[Bug 18969] Elasticsearch - _all field is deprecated - should use copy_to to prepare for ES6

bugzilla-daemon
In reply to this post by bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=18969

David Gustafsson <[hidden email]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |[hidden email]

--- Comment #8 from David Gustafsson <[hidden email]> ---
I might be mistaken, I will try to look it up to be certain. But I think when
using the query string query in Elasticsearch, the default_field parameter is
set to "*.*" as default, so all fields are searched anyway. No need to copy all
fields to yet another field, this will also have a negative impact on search
performance without improving accuracy, and increase the size of the index.
Also, for field boosts to work (title:(title search)^2, each field must be
queried individually. A better solution would be to just remove "_all", and
even better to produce a relevant list of fields for setting the
"fields"-parameter (excluding sortable and facetable fields for example). I
will start working on a patch taking this approach, so will probably have a
proposal ready quite soon.

--
You are receiving this mail because:
You are watching all bug changes.
_______________________________________________
Koha-bugs mailing list
[hidden email]
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/
Reply | Threaded
Open this post in threaded view
|

[Bug 18969] Elasticsearch - _all field is deprecated - should use copy_to to prepare for ES6

bugzilla-daemon
In reply to this post by bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=18969

--- Comment #9 from Nick Clemens <[hidden email]> ---
I think you are correct, ES will default to *.*, however, I think having an
'_all_fields' and searching that directly gives us a little more flexibility.

Currently fields like 'nonpublicnote' are indexed and searchable, even if
hidden from the opac. Using the all field would allow us to keep those fields
in the full record stored in ES and search them from the staff side while
removing them from the opac side

I believe for boosting we could stil search:
"fields" : ["_all_fields", "title.*^5"],

Thoughts David?

--
You are receiving this mail because:
You are watching all bug changes.
_______________________________________________
Koha-bugs mailing list
[hidden email]
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/
Reply | Threaded
Open this post in threaded view
|

[Bug 18969] Elasticsearch - _all field is deprecated - should use copy_to to prepare for ES6

bugzilla-daemon
In reply to this post by bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=18969

--- Comment #10 from David Gustafsson <[hidden email]> ---
Hmm.. I'm not really sure I understand. Would not nonpublicnote be searchable
as nonpublicnote:"search string", regardless if coming from opac or not? One
way of protecting it would be for example to have a different "fields"-list for
opac and staff client, and filter out appearances of black-listed fields
(nonpublicnote for example) in query string with some regexp in opac, but not
staff client?

--
You are receiving this mail because:
You are watching all bug changes.
_______________________________________________
Koha-bugs mailing list
[hidden email]
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/
Reply | Threaded
Open this post in threaded view
|

[Bug 18969] Elasticsearch - _all field is deprecated - should use copy_to to prepare for ES6

bugzilla-daemon
In reply to this post by bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=18969

--- Comment #11 from Nick Clemens <[hidden email]> ---
(In reply to David Gustafsson from comment #10)
> Hmm.. I'm not really sure I understand. Would not nonpublicnote be
> searchable as nonpublicnote:"search string", regardless if coming from opac
> or not? One way of protecting it would be for example to have a different
> "fields"-list for opac and staff client, and filter out appearances of
> black-listed fields (nonpublicnote for example) in query string with some
> regexp in opac, but not staff client?

Yes, we would need to implement something more along those lines to truly hide
it, but having an 'all' would allow for simple keyword searching of everything
but the blacklisted fields. This is all theoretical currently, just looking
towards possibilities

--
You are receiving this mail because:
You are watching all bug changes.
_______________________________________________
Koha-bugs mailing list
[hidden email]
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/
Reply | Threaded
Open this post in threaded view
|

[Bug 18969] Elasticsearch - _all field is deprecated - should use copy_to to prepare for ES6

bugzilla-daemon
In reply to this post by bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=18969

--- Comment #12 from David Gustafsson <[hidden email]> ---
Ok, so you mean we would have opac-searchable fields in "_all_fields", and then
add additional fields in "fields" for staff client searches? Would it not be
better to set "fields" to a list of fields, excluding blacklisted fields, in
opac, and all searchable fields in staff client?

--
You are receiving this mail because:
You are watching all bug changes.
_______________________________________________
Koha-bugs mailing list
[hidden email]
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/
Reply | Threaded
Open this post in threaded view
|

[Bug 18969] Elasticsearch - _all field is deprecated - should use copy_to to prepare for ES6

bugzilla-daemon
In reply to this post by bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=18969

--- Comment #13 from Nick Clemens <[hidden email]> ---
Probably something like that.

'_all_fields' could be default for staff
'_all_opac_fields' could be default for opac, then we catch the blacklisted as
you said

Really though, I think this is all a separate bug :-) Mostly I wanted to say
that we should keep the copy_to and that it is not a detriment i.e. we should
move this patch forward

--
You are receiving this mail because:
You are watching all bug changes.
_______________________________________________
Koha-bugs mailing list
[hidden email]
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/
Reply | Threaded
Open this post in threaded view
|

[Bug 18969] Elasticsearch - _all field is deprecated - should use copy_to to prepare for ES6

bugzilla-daemon
In reply to this post by bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=18969

--- Comment #14 from David Gustafsson <[hidden email]> ---
Ok! I actually think that would be a bad idea, mainly for the following
reasons:

1) Elasticsearch uses ranking function called Okapi BM25 (used to be Term
Frequency/Inverse Document Frequency (TF/IDF), which similar but simpler to
understand). Two of the parameters Okapi BM25 uses to calculate the relevancy
score (per field) are average field length and inverse document frequency
(IDF). If you put all values in one field, average field length and inverse
document frequency will averaged out based on all fields, effectively crippling
the algorithm rendering it unable to calculate relevancy properly.

2) You will also not be able to use per field boosting, unless you add boosted
fields to "fields" as well, but then you might as well skip the "_all_*" fields
and pass along the full list of fields instead.

3) The index will be about 3x as big, increasing memory usage. This might not a
huge issue, but could be for us for example as we have several million biblios
and already quite a large index already.

4) To utilize the full power of Elasticsearch one would want to be able to use
different analyzers/normalizers and other useful mapping settings on a per
field basis, and nice query string query options like "quote_field_suffix".
With everyting in one field, all data will be indexed using the same mapping
settings, and features like quote_field_suffix will not work.

I can actually see no benefits with using "all_*" fields, and no real downside
by instead generating a proper "fields" containing all searchable fields. I
begun working on a patch today (one of the reasons was that we need per field
boosting), and it's actually not a very complicated change. Might not be ready
tomorrow, but at least some time in the beginning of next week.

--
You are receiving this mail because:
You are watching all bug changes.
_______________________________________________
Koha-bugs mailing list
[hidden email]
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/
Reply | Threaded
Open this post in threaded view
|

[Bug 18969] Elasticsearch - _all field is deprecated - should use copy_to to prepare for ES6

bugzilla-daemon
In reply to this post by bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=18969

--- Comment #15 from Nick Clemens <[hidden email]> ---
(In reply to David Gustafsson from comment #14)
> Ok! I actually think that would be a bad idea, mainly for the following
> reasons:
>
> 1) Elasticsearch uses ranking function called Okapi BM25...If you put all values in one field, average field length
> and inverse document frequency will averaged out based on all fields,

Ah, okay, I see this in the documentation:
https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-all-field.html

Built-in or constructed we pay a relevance price


> 2) You will also not be able to use per field boosting, unless you add
> boosted fields to "fields" as well, but then you might as well skip the
> "_all_*" fields and pass along the full list of fields instead.

Well, it does seem to work to only add boosted fields and boost those above
all, but again, not as exact

> 3) The index will be about 3x as big, increasing memory usage. This might
> not a huge issue, but could be for us for example as we have several million
> biblios and already quite a large index already.

Agreed, I think we would need to compare with an without the all field to see
exact impact


> 4) To utilize the full power of Elasticsearch one would want to be able to
> use different analyzers/normalizers and other useful mapping settings on a
> per field basis, and nice query string query options like
> "quote_field_suffix". With everything in one field, all data will be indexed
> using the same mapping settings, and features like quote_field_suffix will
> not work.

I don't think I actually follow you here - we still specify different analyzers
per field, but we also construct the _all field and use that for keyword
searching only - this is what we currently do. So we can search specific
fields, or use the all



> I can actually see no benefits with using "all_*" fields, and no real
> downside by instead generating a proper "fields" containing all searchable
> fields.
The only downside is listing all the fields individually so a small cost in
construction of queries and query size, but not terrible I would think

>I begun working on a patch today (one of the reasons was that we
> need per field boosting), and it's actually not a very complicated change.
> Might not be ready tomorrow, but at least some time in the beginning of next
> week.

Looking forward to it! :-) - have you seen bug 18316?
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=18316

--
You are receiving this mail because:
You are watching all bug changes.
_______________________________________________
Koha-bugs mailing list
[hidden email]
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/
Reply | Threaded
Open this post in threaded view
|

[Bug 18969] Elasticsearch - _all field is deprecated - should use copy_to to prepare for ES6

bugzilla-daemon
In reply to this post by bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=18969

--- Comment #16 from David Gustafsson <[hidden email]> ---
> I don't think I actually follow you here - we still specify different analyzers
> per field, but we also construct the _all field and use that for keyword
> searching only - this is what we currently do. So we can search specific
> fields, or use the all.

My point was with regards to this line:
https://github.com/Koha-Community/Koha/blob/master/Koha/SearchEngine/Elasticsearch/QueryBuilder.pm#L93
where default_field is set to '_all', and in the patch this is changed to
'_all_fields'. If searching for some terms (without specifying specific fields
in the search string), the default_field will be used, 'all_fields' in this
case, and no field-specific analyzers will be used. If instead setting "fields"
to all relevant fields, the field-specific analyzers etc will be applied.

> The only downside is listing all the fields individually so a small cost in
> construction of queries and query size, but not terrible I would think

Yes, you are correct. The query construction overhead is insignificant, but it
will incur a slightly higher cost executing the query as multiple fields will
be queried instead of just one. But I still don't see using just one field as
an option since this would make most of the nice Elasticsearch free text search
features impossible to implement.


> Looking forward to it! :-) - have you seen bug 18316?
> https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=18316

Damn, must have missed that one. They are still slightly incompatible so
perhaps still would have needed to re-implement field boosts (as the new patch
always uses the "field" parameter with possible boosts, not just for boosted
fields).

I created a new issue for my suggestion regarding remove the "_all" field and
field boosts: https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=20589

--
You are receiving this mail because:
You are watching all bug changes.
_______________________________________________
Koha-bugs mailing list
[hidden email]
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/
Reply | Threaded
Open this post in threaded view
|

[Bug 18969] Elasticsearch - _all field is deprecated - should use copy_to to prepare for ES6

bugzilla-daemon
In reply to this post by bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=18969

Nick Clemens <[hidden email]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           See Also|                            |https://bugs.koha-community
                   |                            |.org/bugzilla3/show_bug.cgi
                   |                            |?id=20589

--
You are receiving this mail because:
You are watching all bug changes.
_______________________________________________
Koha-bugs mailing list
[hidden email]
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/
Reply | Threaded
Open this post in threaded view
|

[Bug 18969] _all field is deprecated - should use copy_to to prepare for ES6

bugzilla-daemon
In reply to this post by bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=18969

Jonathan Druart <[hidden email]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
            Summary|Elasticsearch - _all field  |_all field is deprecated -
                   |is deprecated - should use  |should use copy_to to
                   |copy_to to prepare for ES6  |prepare for ES6

--
You are receiving this mail because:
You are watching all bug changes.
_______________________________________________
Koha-bugs mailing list
[hidden email]
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/
Reply | Threaded
Open this post in threaded view
|

[Bug 18969] _all field is deprecated - should use copy_to to prepare for ES6

bugzilla-daemon
In reply to this post by bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=18969

Katrin Fischer <[hidden email]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |[hidden email]

--- Comment #17 from Katrin Fischer <[hidden email]> ---
What should be the next step here? I am bit lost in the discussion, has
consensus be reached or should we move this out of the queue for more
discussion?

--
You are receiving this mail because:
You are watching all bug changes.
_______________________________________________
Koha-bugs mailing list
[hidden email]
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/
Reply | Threaded
Open this post in threaded view
|

[Bug 18969] _all field is deprecated - should use copy_to to prepare for ES6

bugzilla-daemon
In reply to this post by bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=18969

--- Comment #18 from Alex Arnaud <[hidden email]> ---
(In reply to Katrin Fischer from comment #17)
> What should be the next step here? I am bit lost in the discussion, has
> consensus be reached or should we move this out of the queue for more
> discussion?

If i understand correctly, storing the entire record in all_fields_* would help
us to hide some fields like nonpublicnote (Nick, another advantage?) by
creating all_fields and all_fields_opac. But this could be done with a
staff/opac parameter on ES configuration form like David did in bug 20589.

I wrote this patch in order to make Koha compliant with ES 6. But if we remove
this "feature", it is the same result. And the job is also done in bug 20589.

--
You are receiving this mail because:
You are watching all bug changes.
_______________________________________________
Koha-bugs mailing list
[hidden email]
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/
Reply | Threaded
Open this post in threaded view
|

[Bug 18969] _all field is deprecated - should use copy_to to prepare for ES6

bugzilla-daemon
In reply to this post by bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=18969

--- Comment #19 from Katrin Fischer <[hidden email]> ---

> If i understand correctly, storing the entire record in all_fields_* would
> help us to hide some fields like nonpublicnote (Nick, another advantage?) by
> creating all_fields and all_fields_opac. But this could be done with a
> staff/opac parameter on ES configuration form like David did in bug 20589.
>
> I wrote this patch in order to make Koha compliant with ES 6. But if we
> remove this "feature", it is the same result. And the job is also done in
> bug 20589.

Which job does 20589? Resolving the ES6 issue?

--
You are receiving this mail because:
You are watching all bug changes.
_______________________________________________
Koha-bugs mailing list
[hidden email]
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/
12