[Bug 19893] New: Alternative optimized indexing for Elasticsearch

classic Classic list List threaded Threaded
174 messages Options
1234 ... 6
Reply | Threaded
Open this post in threaded view
|

[Bug 19893] New: Alternative optimized indexing for Elasticsearch

bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=19893

            Bug ID: 19893
           Summary: Alternative optimized indexing for Elasticsearch
 Change sponsored?: ---
           Product: Koha
           Version: master
          Hardware: All
                OS: All
            Status: NEW
          Severity: enhancement
          Priority: P5 - low
         Component: Searching - Elasticsearch
          Assignee: [hidden email]
          Reporter: [hidden email]

At our library perhaps owning to a larger than average number of biblios a full
re-index takes an unacceptable amount of time complete (> 24h). We also had an
issue with indexing becoming increasingly slower when new mappings are added.
After some profiling using NYTProf it became clear most of this overhead is in
Catmandu::Store::ElasticSearch and Catmandu::MARC. After giving it some thought
the simplest way to resolve this issue actually seemed to be to replace these
libraries with Koha-specific code, since the functionality provided is actually
not that hard to re-implement in a more efficient manner. Due to the complexity
of Catmandu optimizing these libraries would most likely be more challenging
(and some parts are not actually possible to optimize because of limitations
owing to the architecture of Catmandu/Fix). Main benefits include:

1) Increased indexing performance (about twice as fast, six times as fast if
comparing time spent in update_index()), due to more efficient json-conversion
and fewer Elasticsearch requests.
2) With Catmandu indexing speed decreases as more mappings are added, with the
alternative algorithm indexing is kept more or less constant no matter how many
mappings you add.
3) Neglectable indexing start-up time. For example we have an issue with the
book drop machine, each return taking a couple of seconds because of the
catmandu start-up overhead.
4) More transparent code and less complexity compared with Catmandu.

With this patch the largest bottleneck is instead Marc::Record::as_xml_record,
to use marc21 as serialization format would probably be a lot faster but still
chose marc-xml because of the binary format length limitation (which could be
exceeded with many items). Still, I will probably try to look into faster
marc-xml serialization options in the future to address this.

I also attach profiling results with and without the patch applied.

--
You are receiving this mail because:
You are the assignee for the bug.
You are watching all bug changes.
_______________________________________________
Koha-bugs mailing list
[hidden email]
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/
Reply | Threaded
Open this post in threaded view
|

[Bug 19893] Alternative optimized indexing for Elasticsearch

bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=19893

--- Comment #1 from David Gustafsson <[hidden email]> ---
Created attachment 70199
  -->
https://bugs.koha-community.org/bugzilla3/attachment.cgi?id=70199&action=edit
master

--
You are receiving this mail because:
You are the assignee for the bug.
You are watching all bug changes.
_______________________________________________
Koha-bugs mailing list
[hidden email]
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/
Reply | Threaded
Open this post in threaded view
|

[Bug 19893] Alternative optimized indexing for Elasticsearch

bugzilla-daemon
In reply to this post by bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=19893

--- Comment #2 from David Gustafsson <[hidden email]> ---
Created attachment 70200
  -->
https://bugs.koha-community.org/bugzilla3/attachment.cgi?id=70200&action=edit
patched

--
You are receiving this mail because:
You are the assignee for the bug.
You are watching all bug changes.
_______________________________________________
Koha-bugs mailing list
[hidden email]
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/
Reply | Threaded
Open this post in threaded view
|

[Bug 19893] Alternative optimized indexing for Elasticsearch

bugzilla-daemon
In reply to this post by bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=19893

David Gustafsson <[hidden email]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         QA Contact|                            |[hidden email]

--
You are receiving this mail because:
You are the assignee for the bug.
You are watching all bug changes.
_______________________________________________
Koha-bugs mailing list
[hidden email]
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/
Reply | Threaded
Open this post in threaded view
|

[Bug 19893] Alternative optimized indexing for Elasticsearch

bugzilla-daemon
In reply to this post by bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=19893

--- Comment #3 from David Gustafsson <[hidden email]> ---
Created attachment 70201
  -->
https://bugs.koha-community.org/bugzilla3/attachment.cgi?id=70201&action=edit
Bug 19893 - Alternative optimized indexing for Elasticsearch

Add alternative optimized indexing for Elasticsearch

How to test:
1) Time a full elasticsearch re-index by running the rebuild_elastic_search.pl
   with the -d flag: `koha-shell <instance_name> -c "time
rebuild_elastic_search.pl -d"`.
2) Enable ExperimentalElasticSearchIndexing system preference
   (found under Global System preferences -> Administration -> Search Engine).
3) Time a full re-index again, it should be about twice at fast (for a
   couple of thousand biblios, with fewer biblios results may be more
   unpredictable).

--
You are receiving this mail because:
You are the assignee for the bug.
You are watching all bug changes.
_______________________________________________
Koha-bugs mailing list
[hidden email]
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/
Reply | Threaded
Open this post in threaded view
|

[Bug 19893] Alternative optimized indexing for Elasticsearch

bugzilla-daemon
In reply to this post by bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=19893

David Gustafsson <[hidden email]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
  Attachment #70201|0                           |1
        is obsolete|                            |

--- Comment #4 from David Gustafsson <[hidden email]> ---
Created attachment 70202
  -->
https://bugs.koha-community.org/bugzilla3/attachment.cgi?id=70202&action=edit
Bug 19893 - Alternative optimized indexing for Elasticsearch

Add alternative optimized indexing for Elasticsearch

How to test:
1) Time a full elasticsearch re-index by running the rebuild_elastic_search.pl
   with the -d flag: `koha-shell <instance_name> -c "time
rebuild_elastic_search.pl -d"`.
2) Enable ExperimentalElasticSearchIndexing system preference
   (found under Global System preferences -> Administration -> Search Engine).
3) Time a full re-index again, it should be about twice at fast (for a
   couple of thousand biblios, with fewer biblios results may be more
   unpredictable).

Sponsored-by: Gothenburg University Library

--
You are receiving this mail because:
You are the assignee for the bug.
You are watching all bug changes.
_______________________________________________
Koha-bugs mailing list
[hidden email]
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/
Reply | Threaded
Open this post in threaded view
|

[Bug 19893] Alternative optimized indexing for Elasticsearch

bugzilla-daemon
In reply to this post by bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=19893

David Gustafsson <[hidden email]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |Needs Signoff

--
You are receiving this mail because:
You are watching all bug changes.
You are the assignee for the bug.
_______________________________________________
Koha-bugs mailing list
[hidden email]
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/
Reply | Threaded
Open this post in threaded view
|

[Bug 19893] Alternative optimized indexing for Elasticsearch

bugzilla-daemon
In reply to this post by bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=19893

David Gustafsson <[hidden email]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
  Attachment #70202|0                           |1
        is obsolete|                            |

--- Comment #5 from David Gustafsson <[hidden email]> ---
Created attachment 70230
  -->
https://bugs.koha-community.org/bugzilla3/attachment.cgi?id=70230&action=edit
Bug 19893 - Alternative optimized indexing for Elasticsearch

Add alternative optimized indexing for Elasticsearch

How to test:
1) Time a full elasticsearch re-index by running the rebuild_elastic_search.pl
   with the -d flag: `koha-shell <instance_name> -c "time
rebuild_elastic_search.pl -d"`.
2) Enable ExperimentalElasticSearchIndexing system preference
   (found under Global System preferences -> Administration -> Search Engine).
3) Time a full re-index again, it should be about twice at fast (for a
   couple of thousand biblios, with fewer biblios results may be more
   unpredictable).

Sponsored-by: Gothenburg University Library

--
You are receiving this mail because:
You are watching all bug changes.
You are the assignee for the bug.
_______________________________________________
Koha-bugs mailing list
[hidden email]
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/
Reply | Threaded
Open this post in threaded view
|

[Bug 19893] Alternative optimized indexing for Elasticsearch

bugzilla-daemon
In reply to this post by bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=19893

--- Comment #6 from David Gustafsson <[hidden email]> ---
Fixed an instance of hard-coded Elasticsearch mapping document type in
update_mappings().

--
You are receiving this mail because:
You are watching all bug changes.
You are the assignee for the bug.
_______________________________________________
Koha-bugs mailing list
[hidden email]
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/
Reply | Threaded
Open this post in threaded view
|

[Bug 19893] Alternative optimized indexing for Elasticsearch

bugzilla-daemon
In reply to this post by bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=19893

David Gustafsson <[hidden email]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
  Attachment #70230|0                           |1
        is obsolete|                            |

--- Comment #7 from David Gustafsson <[hidden email]> ---
Created attachment 70325
  -->
https://bugs.koha-community.org/bugzilla3/attachment.cgi?id=70325&action=edit
Bug 19893 - Alternative optimized indexing for Elasticsearch

Add alternative optimized indexing for Elasticsearch

How to test:
1) Time a full elasticsearch re-index by running the rebuild_elastic_search.pl
   with the -d flag: `koha-shell <instance_name> -c "time
rebuild_elastic_search.pl -d"`.
2) Enable ExperimentalElasticSearchIndexing system preference
   (found under Global System preferences -> Administration -> Search Engine).
3) Time a full re-index again, it should be about twice at fast (for a
   couple of thousand biblios, with fewer biblios results may be more
   unpredictable).

Sponsored-by: Gothenburg University Library

--
You are receiving this mail because:
You are the assignee for the bug.
You are watching all bug changes.
_______________________________________________
Koha-bugs mailing list
[hidden email]
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/
Reply | Threaded
Open this post in threaded view
|

[Bug 19893] Alternative optimized indexing for Elasticsearch

bugzilla-daemon
In reply to this post by bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=19893

--- Comment #8 from David Gustafsson <[hidden email]> ---
Fixed decoding of marc records returned from Elasticsearch (depending on
setting) in some places where this was missing.

--
You are receiving this mail because:
You are the assignee for the bug.
You are watching all bug changes.
_______________________________________________
Koha-bugs mailing list
[hidden email]
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/
Reply | Threaded
Open this post in threaded view
|

[Bug 19893] Alternative optimized indexing for Elasticsearch

bugzilla-daemon
In reply to this post by bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=19893

David Gustafsson <[hidden email]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|Needs Signoff               |Failed QA

--- Comment #9 from David Gustafsson <[hidden email]> ---
Just saw that a vim swap file was accidentally committed, marking as failed
until I have fixed this (shortly).

--
You are receiving this mail because:
You are the assignee for the bug.
You are watching all bug changes.
_______________________________________________
Koha-bugs mailing list
[hidden email]
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/
Reply | Threaded
Open this post in threaded view
|

[Bug 19893] Alternative optimized indexing for Elasticsearch

bugzilla-daemon
In reply to this post by bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=19893

David Gustafsson <[hidden email]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|Failed QA                   |Needs Signoff

--- Comment #10 from David Gustafsson <[hidden email]> ---
Wrong issue, sorry.

--
You are receiving this mail because:
You are watching all bug changes.
You are the assignee for the bug.
_______________________________________________
Koha-bugs mailing list
[hidden email]
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/
Reply | Threaded
Open this post in threaded view
|

[Bug 19893] Alternative optimized indexing for Elasticsearch

bugzilla-daemon
In reply to this post by bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=19893

Nicolas Legrand <[hidden email]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|Needs Signoff               |In Discussion
                 CC|                            |[hidden email]

--- Comment #11 from Nicolas Legrand <[hidden email]> ---
It breaks my authority indexing (Unimarc) :

substr outside of string at /home/koha/perl5/lib/perl5/MARC/File/XML.pm line
563.
Use of uninitialized value $enc in string eq at
/home/koha/perl5/lib/perl5/MARC/File/XML.pm line 565.
Use of uninitialized value $enc in string eq at
/home/koha/perl5/lib/perl5/MARC/File/XML.pm line 565.
Use of uninitialized value $enc in string eq at
/home/koha/perl5/lib/perl5/MARC/File/XML.pm line 567.
Use of uninitialized value $enc in concatenation (.) or string at
/home/koha/perl5/lib/perl5/MARC/File/XML.pm line 570.
Unsupported UNIMARC character encoding [] for XML output for unimarc; 100$a ->
20180313 frey50       at /home/koha/perl5/lib/perl5/MARC/File/XML.pm line 570.

--
You are receiving this mail because:
You are watching all bug changes.
You are the assignee for the bug.
_______________________________________________
Koha-bugs mailing list
[hidden email]
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/
Reply | Threaded
Open this post in threaded view
|

[Bug 19893] Alternative optimized indexing for Elasticsearch

bugzilla-daemon
In reply to this post by bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=19893

--- Comment #12 from David Gustafsson <[hidden email]> ---
Ok, have not tried it for unimarc-records myself, but from the error messages
it appears the perl MARC-library expects subfield 100a to contain encoding
information, and that it's emtpy. Right now not really sure if this is a
MARC:::File::XML issue, data issue, or a bug in the patch. It could perhaps be
worked around if added option for which serialization format to use (binary
marc in addition to marc-xml). Opted for marc-xml mainly because of the
length-limitation, we have some records with too many items so will not fit
into binary marc.

--
You are receiving this mail because:
You are the assignee for the bug.
You are watching all bug changes.
_______________________________________________
Koha-bugs mailing list
[hidden email]
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/
Reply | Threaded
Open this post in threaded view
|

[Bug 19893] Alternative optimized indexing for Elasticsearch

bugzilla-daemon
In reply to this post by bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=19893

David Gustafsson <[hidden email]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
  Attachment #70325|0                           |1
        is obsolete|                            |

--- Comment #13 from David Gustafsson <[hidden email]> ---
Created attachment 73153
  -->
https://bugs.koha-community.org/bugzilla3/attachment.cgi?id=73153&action=edit
Bug 19893 - Alternative optimized indexing for Elasticsearch

Add alternative optimized indexing for Elasticsearch

How to test:
1) Time a full elasticsearch re-index by running the rebuild_elastic_search.pl
   with the -d flag: `koha-shell <instance_name> -c "time
rebuild_elastic_search.pl -d"`.
2) Enable ExperimentalElasticSearchIndexing system preference
   (found under Global System preferences -> Administration -> Search Engine).
3) Time a full re-index again, it should be about twice at fast (for a
   couple of thousand biblios, with fewer biblios results may be more
   unpredictable).

Sponsored-by: Gothenburg University Library

--
You are receiving this mail because:
You are the assignee for the bug.
You are watching all bug changes.
_______________________________________________
Koha-bugs mailing list
[hidden email]
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/
Reply | Threaded
Open this post in threaded view
|

[Bug 19893] Alternative optimized indexing for Elasticsearch

bugzilla-daemon
In reply to this post by bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=19893

--- Comment #14 from David Gustafsson <[hidden email]> ---
Fixed an unrelated issue where rebuild-index crashed if index did not already
exist, will now create index if not exists.

--
You are receiving this mail because:
You are watching all bug changes.
You are the assignee for the bug.
_______________________________________________
Koha-bugs mailing list
[hidden email]
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/
Reply | Threaded
Open this post in threaded view
|

[Bug 19893] Alternative optimized indexing for Elasticsearch

bugzilla-daemon
In reply to this post by bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=19893

--- Comment #15 from Nicolas Legrand <[hidden email]> ---
Authorities is still patchy with ES for Unimarc.

Nevertheless 100$a is not an empty field in my test sample. The way it is
filled follows most french university libraries common rules (sudoc :
http://documentation.abes.fr/sudoc/formats/unmb/zones/100.htm) which for this
part... are not compliant to the Unimarc standard... It has 24 characters where
Unimarc says it should have 35...

--
You are receiving this mail because:
You are the assignee for the bug.
You are watching all bug changes.
_______________________________________________
Koha-bugs mailing list
[hidden email]
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/
Reply | Threaded
Open this post in threaded view
|

[Bug 19893] Alternative optimized indexing for Elasticsearch

bugzilla-daemon
In reply to this post by bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=19893

Tomás Cohen Arazi <[hidden email]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |[hidden email]

--
You are receiving this mail because:
You are watching all bug changes.
_______________________________________________
Koha-bugs mailing list
[hidden email]
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/
Reply | Threaded
Open this post in threaded view
|

[Bug 19893] Alternative optimized indexing for Elasticsearch

bugzilla-daemon
In reply to this post by bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=19893

Ere Maijala <[hidden email]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |[hidden email]

--
You are receiving this mail because:
You are watching all bug changes.
_______________________________________________
Koha-bugs mailing list
[hidden email]
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/
Reply | Threaded
Open this post in threaded view
|

[Bug 19893] Alternative optimized indexing for Elasticsearch

bugzilla-daemon
In reply to this post by bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=19893

Ere Maijala <[hidden email]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Blocks|                            |20244


Referenced Bugs:

https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=20244
[Bug 20244] Elasticsearch - Indexing improvements
--
You are receiving this mail because:
You are watching all bug changes.
You are the assignee for the bug.
_______________________________________________
Koha-bugs mailing list
[hidden email]
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/
Reply | Threaded
Open this post in threaded view
|

[Bug 19893] Alternative optimized indexing for Elasticsearch

bugzilla-daemon
In reply to this post by bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=19893

--- Comment #16 from Ere Maijala <[hidden email]> ---
I like this approach quite a bit. A couple of comments:

1.) Incomplete fields should be handled more gracefully. I'm getting a set of
these warnings:
substr outside of string at
/home/ere/kohacommunity/Koha/SearchEngine/Elasticsearch.pm line 356.
This is with the test records from
https://github.com/joubu/koha-misc4dev/tree/master/data/sql/marc21/1611/after_17196

2.) es_id is missing from the indexed records compared to the Catmandu indexing
code. Is this intentional?

3.) I think this, when done, should just replace the Catmandu-based indexing
code. Since the ES support itself is somewhat experimental, it would make sense
to switch once and for all.

4.) Make sure to document the dependency on Search::Elasticsearch. Would it be
possible to use the v6 module? It says it supports ES 5 too. I had to downgrade
it on my system for the patch to work.

5.) Booleans should be indexed as true/false. I'm seeing deprecation notices
for suppress and onloan (this is wrong in the old code too, but could as well
be done correctly here).

6.) The Catmandu version seems to create way more __sort fields, but perhaps
it's a bug in the Catmandu version?

7.) The patch doesn't apply cleanly, so there could be also something I screwed
up while fixing it manually.

--
You are receiving this mail because:
You are the assignee for the bug.
You are watching all bug changes.
_______________________________________________
Koha-bugs mailing list
[hidden email]
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/
Reply | Threaded
Open this post in threaded view
|

[Bug 19893] Alternative optimized indexing for Elasticsearch

bugzilla-daemon
In reply to this post by bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=19893

--- Comment #17 from David Gustafsson <[hidden email]> ---
Thanks

> 1.) Incomplete fields should be handled more gracefully...

I agree. Have ignored this since only results in a warning, but could probably
be done in better way. Some mappings ranges in the mappings YAML are off by one
(mostly the ff*-fields I think), this might aggravate the issue.

> 2.) es_id is missing from the indexed records compared to the Catmandu indexing code. Is this intentional?

I think this is something Catmandu adds, and I think I just left it out since
doesn't seems to be needed for anything.

> 3.) I think this, when done, should just replace the Catmandu-based indexing code. Since the ES support itself is somewhat experimental, it would make sense to switch once and for all.

That would be wonderful. I would actually love to get rid of all
Catmandu-dependencies altogether. Indexing is probably the heaviest part, the
rest should be quite trivial to replace with Search::Elasticsearch (which
Catamndu uses internally).

> 4.) Make sure to document the dependency on Search::Elasticsearch. Would it be possible to use the v6 module? It says it supports ES 5 too. I had to downgrade it on my system for the patch to work.

I did not document it since Catmandu depends on it, but it will need to be done
if Catmandu is no longer a dependency. To use v6 you just have to change,
client => "5_0::Direct" to client => "6_0::Direct". If I'm not mistaken, I
think we are running ES 6 with the "5_0::Direct" line (and works, but probably
not optimal). The reason I went for 5.0 was that was the version Koha was using
at the time of the initial version of the patch (I think).

> 5.) Booleans should be indexed as true/false. I'm seeing deprecation notices for suppress and onloan (this is wrong in the old code too, but could as well be done correctly here).

I agree. This is a problem in Koha master as well I think (but there is a
bugzilla issue that takes care of it). It would be easy to fix so will make
sure to do so.

> 6.) The Catmandu version seems to create way more __sort fields, but perhaps it's a bug in the Catmandu version?

Yes, I think this is a bug in Koha master, but will look into to it.

> 7.) The patch doesn't apply cleanly, so there could be also something I screwed up while fixing it manually.

I think you got it right, most of the above I have run into myself, though a
bit strange that you had to downgrade to 5.0 to get it to work? Can verify
tomorrow that it really is 6.x we are running.

--
You are receiving this mail because:
You are watching all bug changes.
You are the assignee for the bug.
_______________________________________________
Koha-bugs mailing list
[hidden email]
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/
Reply | Threaded
Open this post in threaded view
|

[Bug 19893] Alternative optimized indexing for Elasticsearch

bugzilla-daemon
In reply to this post by bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=19893

David Gustafsson <[hidden email]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Assignee|[hidden email]-commun |[hidden email]
                   |ity.org                     |

--
You are receiving this mail because:
You are watching all bug changes.
You are the assignee for the bug.
_______________________________________________
Koha-bugs mailing list
[hidden email]
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/
Reply | Threaded
Open this post in threaded view
|

[Bug 19893] Alternative optimized indexing for Elasticsearch

bugzilla-daemon
In reply to this post by bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=19893

--- Comment #18 from David Gustafsson <[hidden email]> ---
About the version, I was wrong, seams we are running 5.4. I would assume it
would be as easy to just change the version string to get it working with 6
(but not sure how 6 handles _all fields and other deprecated things used by
Koha). We run a quite heavily patched version of Koha where those issues have
been fixed.

--
You are receiving this mail because:
You are watching all bug changes.
_______________________________________________
Koha-bugs mailing list
[hidden email]
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/
Reply | Threaded
Open this post in threaded view
|

[Bug 19893] Alternative optimized indexing for Elasticsearch

bugzilla-daemon
In reply to this post by bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=19893

--- Comment #19 from Ere Maijala <[hidden email]> ---
It's probably better to address the issues with ES 6 separately, so let's just
concentrate on getting the indexing done right, right? :)

--
You are receiving this mail because:
You are watching all bug changes.
_______________________________________________
Koha-bugs mailing list
[hidden email]
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/
Reply | Threaded
Open this post in threaded view
|

[Bug 19893] Alternative optimized indexing for Elasticsearch

bugzilla-daemon
In reply to this post by bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=19893

--- Comment #20 from Ere Maijala <[hidden email]> ---
Actually, now that 20073 is in master, the "include_in_all" issue with ES 6
should be resolved.

--
You are receiving this mail because:
You are watching all bug changes.
_______________________________________________
Koha-bugs mailing list
[hidden email]
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/
Reply | Threaded
Open this post in threaded view
|

[Bug 19893] Alternative optimized indexing for Elasticsearch

bugzilla-daemon
In reply to this post by bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=19893

David Gustafsson <[hidden email]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
  Attachment #73153|0                           |1
        is obsolete|                            |

--- Comment #21 from David Gustafsson <[hidden email]> ---
Created attachment 75109
  -->
https://bugs.koha-community.org/bugzilla3/attachment.cgi?id=75109&action=edit
Bug 19893 - Alternative optimized indexing for Elasticsearch

Add alternative optimized indexing for Elasticsearch

How to test:
1) Time a full elasticsearch re-index by running the rebuild_elastic_search.pl
   with the -d flag: `koha-shell <instance_name> -c "time
rebuild_elastic_search.pl -d"`.
2) Enable ExperimentalElasticSearchIndexing system preference
   (found under Global System preferences -> Administration -> Search Engine).
3) Time a full re-index again, it should be about twice at fast (for a
   couple of thousand biblios, with fewer biblios results may be more
   unpredictable).

Sponsored-by: Gothenburg University Library

--
You are receiving this mail because:
You are watching all bug changes.
_______________________________________________
Koha-bugs mailing list
[hidden email]
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/
Reply | Threaded
Open this post in threaded view
|

[Bug 19893] Alternative optimized indexing for Elasticsearch

bugzilla-daemon
In reply to this post by bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=19893

--- Comment #22 from David Gustafsson <[hidden email]> ---
Rebased against master, have not had time to address the other issues, but will
do so as soon as I can.

--
You are receiving this mail because:
You are watching all bug changes.
_______________________________________________
Koha-bugs mailing list
[hidden email]
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/
Reply | Threaded
Open this post in threaded view
|

[Bug 19893] Alternative optimized indexing for Elasticsearch

bugzilla-daemon
In reply to this post by bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=19893

David Gustafsson <[hidden email]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
  Attachment #75109|0                           |1
        is obsolete|                            |

--- Comment #23 from David Gustafsson <[hidden email]> ---
Created attachment 75271
  -->
https://bugs.koha-community.org/bugzilla3/attachment.cgi?id=75271&action=edit
Bug 19893 - Alternative optimized indexing for Elasticsearch

Add alternative optimized indexing for Elasticsearch

How to test:
1) Time a full elasticsearch re-index by running the rebuild_elastic_search.pl
   with the -d flag: `koha-shell <instance_name> -c "time
rebuild_elastic_search.pl -d"`.
2) Enable ExperimentalElasticSearchIndexing system preference
   (found under Global System preferences -> Administration -> Search Engine).
3) Time a full re-index again, it should be about twice at fast (for a
   couple of thousand biblios, with fewer biblios results may be more
   unpredictable).

Sponsored-by: Gothenburg University Library

--
You are receiving this mail because:
You are watching all bug changes.
_______________________________________________
Koha-bugs mailing list
[hidden email]
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/
1234 ... 6