[Bug 25273] New: Elasticsearch Authority matching is returning too many results

classic Classic list List threaded Threaded
26 messages Options
Reply | Threaded
Open this post in threaded view
|

[Bug 25273] New: Elasticsearch Authority matching is returning too many results

bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=25273

            Bug ID: 25273
           Summary: Elasticsearch Authority matching is returning too many
                    results
 Change sponsored?: ---
           Product: Koha
           Version: master
          Hardware: All
                OS: All
            Status: NEW
          Severity: major
          Priority: P5 - low
         Component: MARC Authority data support
          Assignee: [hidden email]
          Reporter: [hidden email]
        QA Contact: [hidden email]

Bug 24269 attempted to improve the authority matching process in ES, however,
it does not accomplish the full extent of what is needed.

Match-heading is used as the matching field for records, and I added
C4::Heading to be used to generate the correct form including subfields,
however, the subfields are still indexed at normally too. This is problematic
as the index for a heading like:
$aCats$vFiction$zVermont

Is indexed into match-heading as:
["Cats","Cats genresubdiv Fiction geosubdiv Vermont","Vermont"]

Thus if a record has a heading of:
Vermont

It returns the above record.

We need to index for the linker into a field that only gets the correct search
form for heading and no other data. I propose to do this by retrieving the
fields set to be copied from Administration->Authority types and indexing them
into a field that can be used only for matching/

--
You are receiving this mail because:
You are the assignee for the bug.
You are watching all bug changes.
_______________________________________________
Koha-bugs mailing list
[hidden email]
https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/
Reply | Threaded
Open this post in threaded view
|

[Bug 25273] Elasticsearch Authority matching is returning too many results

bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=25273

Andrew Fuerste-Henry <[hidden email]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |[hidden email]

--
You are receiving this mail because:
You are watching all bug changes.
_______________________________________________
Koha-bugs mailing list
[hidden email]
https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/
Reply | Threaded
Open this post in threaded view
|

[Bug 25273] Elasticsearch Authority matching is returning too many results

bugzilla-daemon
In reply to this post by bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=25273

--- Comment #1 from Nick Clemens <[hidden email]> ---
Created attachment 103801
  -->
https://bugs.koha-community.org/bugzilla3/attachment.cgi?id=103801&action=edit
Bug 25273: WIP

This patch moves the code for indexing the match-heading field into its own
special section
only used for authorities

Rather than allowing the user to map this field, we create it on our own and
add to the indexe documents.

Currently, it doesn't work. I think the issue is that match-heading is not
being added to the index so is not searchable

--
You are receiving this mail because:
You are watching all bug changes.
You are the assignee for the bug.
_______________________________________________
Koha-bugs mailing list
[hidden email]
https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/
Reply | Threaded
Open this post in threaded view
|

[Bug 25273] Elasticsearch Authority matching is returning too many results

bugzilla-daemon
In reply to this post by bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=25273

Nick Clemens <[hidden email]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |Needs Signoff

--
You are receiving this mail because:
You are watching all bug changes.
You are the assignee for the bug.
_______________________________________________
Koha-bugs mailing list
[hidden email]
https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/
Reply | Threaded
Open this post in threaded view
|

[Bug 25273] Elasticsearch Authority matching is returning too many results

bugzilla-daemon
In reply to this post by bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=25273

--- Comment #2 from Nick Clemens <[hidden email]> ---
Created attachment 103854
  -->
https://bugs.koha-community.org/bugzilla3/attachment.cgi?id=103854&action=edit
Bug 25273: Make match-heading rely on authority type configuration

The match-heading field is a special field used only by the linker, not
accessible
to staff or patrons via the interface. This field is used to store the
constructed
'search form' used for matching bib headings to authority fields.

In bug 24269 I attempted to use the mappings defined in the inferface and also
inject the search term.
This did not work as too many subfields were indexed on their own and leading
to false matches.
In this bug we remove the mappings for this field, and create it ourselves
during
the indexing process. The C4::Headings module is still used to generate the
correct form,
however, the mappings are set based on the authority types in the system. This
gives the user
the ability to add new typoes, but prevents mapping changes from breaking
linker functionality

To test:
 1 - Start form a sample database
 2 - Download via Z39.50 2 authorities, one of which is a narrower heading of
the other, e.g.:
    Waterworks
    Waterworks - Costs
 3 - Place a heading for the broader term in a record
 4 - Make sure linker is set to default
 5 - Attempt to link the records
 6 - Linking fails
 7 - Apply patch
 8 - Refresh index settings (if using a custom file, remove 'match-heading')
 9 - Reindex ES
10 - Try to link again
11 - It succeeds!

--
You are receiving this mail because:
You are watching all bug changes.
You are the assignee for the bug.
_______________________________________________
Koha-bugs mailing list
[hidden email]
https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/
Reply | Threaded
Open this post in threaded view
|

[Bug 25273] Elasticsearch Authority matching is returning too many results

bugzilla-daemon
In reply to this post by bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=25273

Nick Clemens <[hidden email]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
 Attachment #103801|0                           |1
        is obsolete|                            |

--
You are receiving this mail because:
You are watching all bug changes.
You are the assignee for the bug.
_______________________________________________
Koha-bugs mailing list
[hidden email]
https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/
Reply | Threaded
Open this post in threaded view
|

[Bug 25273] Elasticsearch Authority matching is returning too many results

bugzilla-daemon
In reply to this post by bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=25273

Nick Clemens <[hidden email]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |[hidden email],
                   |                            |[hidden email]
                   |                            |m, [hidden email],
                   |                            |[hidden email]-c
                   |                            |ommunity.org,
                   |                            |mkstephens@lancasterseminar
                   |                            |y.edu
           Assignee|[hidden email]-commun |[hidden email]
                   |ity.org                     |

--
You are receiving this mail because:
You are the assignee for the bug.
You are watching all bug changes.
_______________________________________________
Koha-bugs mailing list
[hidden email]
https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/
Reply | Threaded
Open this post in threaded view
|

[Bug 25273] Elasticsearch Authority matching is returning too many results

bugzilla-daemon
In reply to this post by bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=25273

Nick Clemens <[hidden email]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         Depends on|                            |24269


Referenced Bugs:

https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=24269
[Bug 24269] Authority matching in Elasticsearch is broken when authority has
subdivisions
--
You are receiving this mail because:
You are watching all bug changes.
_______________________________________________
Koha-bugs mailing list
[hidden email]
https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/
Reply | Threaded
Open this post in threaded view
|

[Bug 25273] Elasticsearch Authority matching is returning too many results

bugzilla-daemon
In reply to this post by bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=25273

Michal Denar <[hidden email]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |[hidden email]

--
You are receiving this mail because:
You are watching all bug changes.
_______________________________________________
Koha-bugs mailing list
[hidden email]
https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/
Reply | Threaded
Open this post in threaded view
|

[Bug 25273] Elasticsearch Authority matching is returning too many results

bugzilla-daemon
In reply to this post by bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=25273

Victor Grousset/tuxayo <[hidden email]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |[hidden email]

--- Comment #3 from Victor Grousset/tuxayo <[hidden email]> ---
>  4 - Make sure linker is set to default

LinkerModule syspref right?

--
You are receiving this mail because:
You are watching all bug changes.
_______________________________________________
Koha-bugs mailing list
[hidden email]
https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/
Reply | Threaded
Open this post in threaded view
|

[Bug 25273] Elasticsearch Authority matching is returning too many results

bugzilla-daemon
In reply to this post by bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=25273

--- Comment #4 from Victor Grousset/tuxayo <[hidden email]> ---
I might have done a step wrong, it seems I'm not getting a well linked auth in
the end.

Note that my ES setup was only the two steps listed here:
https://wiki.koha-community.org/wiki/User:Victor_Grousset_-_tuxayo:Setup_Koha_development_environment_(koha-testing-docker)#Use_Elasticsearch

Don't know if that's enough.
And also: ES 5 (default of koha-testing-docker)

>  2 - Download via Z39.50 2 authorities, one of which is a narrower heading of the other, e.g.:

Same auths as the example.

>  3 - Place a heading for the broader term in a record

syspref BiblioAddsAuthorities => allow

Go to a record; edit record; go to 650
Replace the existing 650$a by "Waterworks"
and 650$x by "Costs"

> 4 - Make sure linker is set to default

C4::Linker::Default

>  5 - Attempt to link the records

misc/link_bibs_to_authorities.pl

>  6 - Linking fails

In the record's page the link is:

http://172.30.0.6:8081/cgi-bin/koha/catalogue/search.pl?q=su:%22%20Waterworks%22

It's wrong right? (and expected on this step)

>  7 - Apply patch

And after that: restart_all

>  8 - Refresh index settings (if using a custom file, remove 'match-heading')
>  9 - Reindex ES

misc/search_tools/rebuild_elasticsearch.pl -v -d -r

--
You are receiving this mail because:
You are watching all bug changes.
_______________________________________________
Koha-bugs mailing list
[hidden email]
https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/
Reply | Threaded
Open this post in threaded view
|

[Bug 25273] Elasticsearch Authority matching is returning too many results

bugzilla-daemon
In reply to this post by bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=25273

Victor Grousset/tuxayo <[hidden email]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
 Attachment #103854|0                           |1
        is obsolete|                            |

--- Comment #5 from Victor Grousset/tuxayo <[hidden email]> ---
Created attachment 104363
  -->
https://bugs.koha-community.org/bugzilla3/attachment.cgi?id=104363&action=edit
Bug 25273: Make match-heading rely on authority type configuration

The match-heading field is a special field used only by the linker, not
accessible
to staff or patrons via the interface. This field is used to store the
constructed
'search form' used for matching bib headings to authority fields.

In bug 24269 I attempted to use the mappings defined in the inferface and also
inject the search term.
This did not work as too many subfields were indexed on their own and leading
to false matches.
In this bug we remove the mappings for this field, and create it ourselves
during
the indexing process. The C4::Headings module is still used to generate the
correct form,
however, the mappings are set based on the authority types in the system. This
gives the user
the ability to add new typoes, but prevents mapping changes from breaking
linker functionality

To test:
 1 - Start form a sample database with ElasticSearch working
 2 - Download via Z39.50 2 authorities, one of which is a narrower heading of
the other, e.g.:
    Waterworks
    Waterworks - Costs
 3 - Place a heading for the broader term in a record. e.g. Waterworks
       In 650$a, without the cataloguing authority plugin. We don't want
       the link created now.
       You need syspref BiblioAddsAuthorities => allow
 4 - Make sure linker is set to default
 5 - Attempt to link the records
       misc/link_bibs_to_authorities.pl
 6 - Linking fails
 7 - Apply patch
 8 - refresh index settings (if using a custom file, remove 'match-heading')
       You can reset mappings in the UI or run this:
       misc/search_tools/rebuild_elasticsearch.pl -v -d -r
 9 - Reindex ES
10 - Try to link again
11 - It succeeds!
12 - Run the tests
     prove t/db_dependent/Koha/SearchEngine/Elasticsearch.t

Signed-off-by: Victor Grousset/tuxayo <[hidden email]>

--
You are receiving this mail because:
You are watching all bug changes.
_______________________________________________
Koha-bugs mailing list
[hidden email]
https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/
Reply | Threaded
Open this post in threaded view
|

[Bug 25273] Elasticsearch Authority matching is returning too many results

bugzilla-daemon
In reply to this post by bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=25273

Victor Grousset/tuxayo <[hidden email]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|Needs Signoff               |Signed Off

--- Comment #6 from Victor Grousset/tuxayo <[hidden email]> ---
Changes to the test plan:


 1 - Start form a sample database

 1 - Start form a sample database with ElasticSearch working

--

 3 - Place a heading for the broader term in a record

 3 - Place a heading for the broader term in a record. e.g. Waterworks
       In 650$a, without the cataloguing authority plugin. We don't want
       the link created now.
       You need syspref BiblioAddsAuthorities => allow

--

 5 - Attempt to link the records

 5 - Attempt to link the records
       misc/link_bibs_to_authorities.pl

--

 8 - refresh index settings (if using a custom file, remove 'match-heading')

 8 - refresh index settings (if using a custom file, remove 'match-heading')
       You can reset mappings in the UI or run this:
       misc/search_tools/rebuild_elasticsearch.pl -v -d -r

--

Added step 12
12 - Run the tests
     prove t/db_dependent/Koha/SearchEngine/Elasticsearch.t

--
You are receiving this mail because:
You are watching all bug changes.
_______________________________________________
Koha-bugs mailing list
[hidden email]
https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/
Reply | Threaded
Open this post in threaded view
|

[Bug 25273] Elasticsearch Authority matching is returning too many results

bugzilla-daemon
In reply to this post by bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=25273

Frédéric Demians <[hidden email]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |[hidden email]

--
You are receiving this mail because:
You are watching all bug changes.
_______________________________________________
Koha-bugs mailing list
[hidden email]
https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/
Reply | Threaded
Open this post in threaded view
|

[Bug 25273] Elasticsearch Authority matching is returning too many results

bugzilla-daemon
In reply to this post by bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=25273

Nick Clemens <[hidden email]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
 Attachment #104363|0                           |1
        is obsolete|                            |

--- Comment #7 from Nick Clemens <[hidden email]> ---
Created attachment 105435
  -->
https://bugs.koha-community.org/bugzilla3/attachment.cgi?id=105435&action=edit
Bug 25273: Make match-heading rely on authority type configuration

The match-heading field is a special field used only by the linker, not
accessible
to staff or patrons via the interface. This field is used to store the
constructed
'search form' used for matching bib headings to authority fields.

In bug 24269 I attempted to use the mappings defined in the inferface and also
inject the search term.
This did not work as too many subfields were indexed on their own and leading
to false matches.
In this bug we remove the mappings for this field, and create it ourselves
during
the indexing process. The C4::Headings module is still used to generate the
correct form,
however, the mappings are set based on the authority types in the system. This
gives the user
the ability to add new typoes, but prevents mapping changes from breaking
linker functionality

To test:
 1 - Start form a sample database with ElasticSearch working
 2 - Download via Z39.50 2 authorities, one of which is a narrower heading of
the other, e.g.:
    Waterworks
    Waterworks - Costs
 3 - Place a heading for the broader term in a record. e.g. Waterworks
       In 650$a, without the cataloguing authority plugin. We don't want
       the link created now.
       You need syspref BiblioAddsAuthorities => allow
 4 - Make sure linker is set to default
 5 - Attempt to link the records
       misc/link_bibs_to_authorities.pl
 6 - Linking fails
 7 - Apply patch
 8 - refresh index settings (if using a custom file, remove 'match-heading')
       You can reset mappings in the UI or run this:
       misc/search_tools/rebuild_elasticsearch.pl -v -d -r
 9 - Reindex ES
10 - Try to link again
11 - It succeeds!
12 - Run the tests
     prove t/db_dependent/Koha/SearchEngine/Elasticsearch.t

Signed-off-by: Victor Grousset/tuxayo <[hidden email]>

--
You are receiving this mail because:
You are watching all bug changes.
_______________________________________________
Koha-bugs mailing list
[hidden email]
https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/
Reply | Threaded
Open this post in threaded view
|

[Bug 25273] Elasticsearch Authority matching is returning too many results

bugzilla-daemon
In reply to this post by bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=25273

Alex Arnaud <[hidden email]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |[hidden email]

--- Comment #8 from Alex Arnaud <[hidden email]> ---
Functional test works perfectly with default (hardcoded) authority types.

What about custom authority types ?

Wouldn't it be better to had all Match-Heading (those harcoded in
C4::Heading::Marc21... for marc21 part) in the mappings.yaml ?

Without this patch applied:
 - I just added the following mapping for authorities:
     Match-Heading => 150(abgvxyz)

 - Imported an authority: $aWaterworks

 - Imported another authority: $aWaterworks $xCosts

 - Added a 650$aWaterworks in a biblio

 - Added a 650$aWaterworks $xCosts in the same biblio

 - reindex and misc/link_bibs_to_authorities.pl

=> 1st 650 is linked to the authority with the broader term. 2nd 650 is linked
to the other authority.

--
You are receiving this mail because:
You are watching all bug changes.
_______________________________________________
Koha-bugs mailing list
[hidden email]
https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/
Reply | Threaded
Open this post in threaded view
|

[Bug 25273] Elasticsearch Authority matching is returning too many results

bugzilla-daemon
In reply to this post by bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=25273

Alex Arnaud <[hidden email]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|Signed Off                  |Failed QA

--
You are receiving this mail because:
You are watching all bug changes.
_______________________________________________
Koha-bugs mailing list
[hidden email]
https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/
Reply | Threaded
Open this post in threaded view
|

[Bug 25273] Elasticsearch Authority matching is returning too many results

bugzilla-daemon
In reply to this post by bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=25273

Alex Arnaud <[hidden email]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|Failed QA                   |In Discussion

--
You are receiving this mail because:
You are watching all bug changes.
_______________________________________________
Koha-bugs mailing list
[hidden email]
https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/
Reply | Threaded
Open this post in threaded view
|

[Bug 25273] Elasticsearch Authority matching is returning too many results

bugzilla-daemon
In reply to this post by bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=25273

Alex Arnaud <[hidden email]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         QA Contact|[hidden email]-communit |[hidden email]
                   |y.org                       |

--
You are receiving this mail because:
You are watching all bug changes.
_______________________________________________
Koha-bugs mailing list
[hidden email]
https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/
Reply | Threaded
Open this post in threaded view
|

[Bug 25273] Elasticsearch Authority matching is returning too many results

bugzilla-daemon
In reply to this post by bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=25273

--- Comment #9 from Nick Clemens <[hidden email]> ---
(In reply to Alex Arnaud from comment #8)
> What about custom authority types ?

It should work as before - all we do here is form the field in the index the
same way as we generate the search_form that we will search later

> Wouldn't it be better to had all Match-Heading (those harcoded in
> C4::Heading::Marc21... for marc21 part) in the mappings.yaml ?
I don't see a benefit here, as the search_form is hardcoded when linking. We
should ensure that we generate the data in the index the same way that we will
search for it.

Authority searching is a different thing than biblio searching - in this case
anyways it is strictly used to match the two records. The user has no control
over the search_form, so allowing control over the index form can only lead to
confusion


> Without this patch applied:
>  - I just added the following mapping for authorities:
>      Match-Heading => 150(abgvxyz)

This may work in this simpler case, however, specifying the fields generates
them in a fixed order, and authorities order can differ and has different
meanings

Also consider:
$aScience$vFiction
$aScience fiction

Which will match each other under the current code if all subfields mapped

--
You are receiving this mail because:
You are watching all bug changes.
_______________________________________________
Koha-bugs mailing list
[hidden email]
https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/
Reply | Threaded
Open this post in threaded view
|

[Bug 25273] Elasticsearch Authority matching is returning too many results

bugzilla-daemon
In reply to this post by bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=25273

Julian Maurice <[hidden email]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |[hidden email]

--- Comment #10 from Julian Maurice <[hidden email]> ---
Hi Nick,

(In reply to Nick Clemens from comment #9)
> This may work in this simpler case, however, specifying the fields generates
> them in a fixed order, and authorities order can differ and has different
> meanings

Can you provide a test plan with a complex case, which cannot be solved using
configuration only ?

--
You are receiving this mail because:
You are watching all bug changes.
_______________________________________________
Koha-bugs mailing list
[hidden email]
https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/
Reply | Threaded
Open this post in threaded view
|

[Bug 25273] Elasticsearch Authority matching is returning too many results

bugzilla-daemon
In reply to this post by bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=25273

--- Comment #11 from Nick Clemens <[hidden email]> ---
(In reply to Julian Maurice from comment #10)
> Hi Nick,
>
> (In reply to Nick Clemens from comment #9)
> > This may work in this simpler case, however, specifying the fields generates
> > them in a fixed order, and authorities order can differ and has different
> > meanings
>
> Can you provide a test plan with a complex case, which cannot be solved
> using configuration only ?

I don't have a specific example of an authority that doesn't work, but if we
look at the example Alex provides:

Before the patches, with mapping 150(abgvxyz) the two records are indexed with
match-heading as:
['Waterworks']
['Waterworks Costs','Waterworks generalsubdiv Costs']

After the patch:
['Waterworks']
['Waterworks generalsubdiv Costs']

In both cases when linking we perform a search for:
'Waterworks generalsubdiv Costs'

The matching works before only because we are already generating the heading
search form and storing it in the index, the mappings don't affect the terms
used for matching.

Custom added authority types still use the hard coded hashes in
C4/Heading/{marcflavour} to generate the heading search form, so will only work
if they use a field defined there with or without the patches

Adding the user defined fields only adds the possibility for mismatches, it
doesn't add functionality. There are subdivisions that can be reordered (which
mappings don't handle) and terms like '$aScience$vFiction' and '$aScience
fiction' which can end up mismatching

Since we are always using the standardized C4::Heading->search_form when
performing the search we should also only store that search_form in the index

--
You are receiving this mail because:
You are watching all bug changes.
_______________________________________________
Koha-bugs mailing list
[hidden email]
https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/
Reply | Threaded
Open this post in threaded view
|

[Bug 25273] Elasticsearch Authority matching is returning too many results

bugzilla-daemon
In reply to this post by bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=25273

Nick Clemens <[hidden email]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|In Discussion               |Signed Off

--
You are receiving this mail because:
You are watching all bug changes.
_______________________________________________
Koha-bugs mailing list
[hidden email]
https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/
Reply | Threaded
Open this post in threaded view
|

[Bug 25273] Elasticsearch Authority matching is returning too many results

bugzilla-daemon
In reply to this post by bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=25273

Julian Maurice <[hidden email]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|Signed Off                  |Failed QA

--- Comment #12 from Julian Maurice <[hidden email]> ---
Hi Nick,

Thanks for the explanation, it is much more clear to me now. The change makes
sense and the patch works as expected.

However I would like to see some changes in the patch before validating it:

1)

-    ModZebra( $authid, 'specialUpdate', 'authorityserver', $record );
+    ModZebra( $authid, 'specialUpdate', 'authorityserver', { record =>
$record, authtypecode => $authtypecode } );

This change is confusing. In ModZebra we now have a $record variable which is a
hash that contain a 'record' key. Even with this simple patch I had to ask
myself several times « what's this $record variable I'm looking at ? The hash
or the MARC::Record ? ».
Look at this line for instance:

+        $record = $record->{record};

At some point in the subroutine, $record was a hasref, now it's a MARC::Record.
This is the kind of things that make code hard to read, and make it easier for
bugs to appear.

And I think it is not needed to pass the authtypecode to ModZebra, since it can
be obtained from the MARC::Record.

2)

-            unless ($record) {
+            if ($record) {
+                $indexer->update_index_background( [$biblionumber], [$record]
);
+            } else {
                 $record = GetMarcBiblio({
                     biblionumber => $biblionumber,
                     embed_items  => 1 });
+                $indexer->update_index_background( [$biblionumber], [{ record
=> $record }] );
             }
-            my $records = [$record];
-            $indexer->update_index_background( [$biblionumber], [$record] );

I think it was easier to read before : unless there is a record, fetch it; in
all cases call update_index_background
Now it's : if there is a record, call update_index_background, otherwise fetch
the record and call update_index_background.
This change was not needed, so why ? :)

3)

-        my $id     = $record->id // $record->authid;
+        my $id     = $record->id;

Again this change is not needed, but this time it causes a bug.
Try this : misc/search_tools/rebuild_elasticsearch.pl -a -ai X (replace X by an
existing authid)

--
You are receiving this mail because:
You are watching all bug changes.
_______________________________________________
Koha-bugs mailing list
[hidden email]
https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/
Reply | Threaded
Open this post in threaded view
|

[Bug 25273] Elasticsearch Authority matching is returning too many results

bugzilla-daemon
In reply to this post by bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=25273

Julian Maurice <[hidden email]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         QA Contact|[hidden email]    |[hidden email]

--
You are receiving this mail because:
You are watching all bug changes.
_______________________________________________
Koha-bugs mailing list
[hidden email]
https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/
Reply | Threaded
Open this post in threaded view
|

[Bug 25273] Elasticsearch Authority matching is returning too many results

bugzilla-daemon
In reply to this post by bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=25273

Eric Phetteplace <[hidden email]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |[hidden email]

--
You are receiving this mail because:
You are watching all bug changes.
_______________________________________________
Koha-bugs mailing list
[hidden email]
https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/