[Bug 7284] New: Authority matching algorithm improvements

classic Classic list List threaded Threaded
72 messages Options
123
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

[Bug 7284] New: Authority matching algorithm improvements

Bugzilla from bugzilla-daemon@bugs.koha-community.org
http://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=7284

             Bug #: 7284
           Summary: Authority matching algorithm improvements
    Classification: Unclassified
 Change sponsored?: Seeking cosponsors
           Product: Koha
           Version: master
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: enhancement
          Priority: P5 - low
         Component: MARC Authority data support
        AssignedTo: [hidden email]
        ReportedBy: [hidden email]
         QAContact: [hidden email]


At present, the automatic authority matching for MARC21 is of limited use
because it fails on headings with more than one subfield, doesn't take into
account subfield codes, and considers punctuation significant. An improved
matching algorithm should be able to match the following headings to the
correct authorities (these particular examples are from a local authority
file):
=650  #4$aHistory
=650  #4$aHistory
=650  #4$aHistory$xBibliography (the technique of bibliography as applied to
the study of history)
=650  #4$aHistory$vBibliography (bibliographies about history)
=650  #4$aHistory$vBibliography.
=650  #4$aHistory$zGreek Empire$vBibliography
=650  #4$aHistory$zGreek Empire$vBibliography.
=650  #0$aHistory.
=650  #7$aHistory.$2abc

Those headings should match the following authorities:
=150  #4$aHistory.
=150  #4$aHistory$xBibliography.
=150  #4$aHistory$vBibliography.
=150  #4$aHistory$zGreek Empire$vBibliography.
=150  #0$aHistory.
=150  #7$aHistory.$2abc

Libraries with examples of problematic headings from other authority files are
respectfully requested to provide them in comments for the purpose of testing.

There are a number of additional changes needed to make the
link_bibs_to_authorities.pl script and the situation where
BiblioAddsAuthorities=allow work properly:
* The option to link headings to the first matching authority, even if there is
more than one (and provide some sort of warning about that fact)
* Verbose mode on link_bibs_to_authorities.pl should offer more information.
* link_bibs_to_authorities.pl should be able to process only a portion of the
catalog.
* Allow machine-created authority records to be used for indexing.

Potential future changes that would make these features even more useful:
* A web interface to link_bibs_to_authorities.pl
* An authority record de-duplicator
* An option to correct punctuation when authorizing headings (either via
link_bibs_to_authorities.pl or in the cataloging module)

--
Configure bugmail: http://bugs.koha-community.org/bugzilla3/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching all bug changes.
_______________________________________________
Koha-bugs mailing list
[hidden email]
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

[Bug 7284] Authority matching algorithm improvements

Bugzilla from bugzilla-daemon@bugs.koha-community.org
http://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=7284

Chris Cormack <[hidden email]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |[hidden email]

--- Comment #1 from Chris Cormack <[hidden email]> 2011-12-07 20:37:28 UTC ---
Hmm, this sounds like a really useful enhancement.

--
Configure bugmail: http://bugs.koha-community.org/bugzilla3/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching all bug changes.
_______________________________________________
Koha-bugs mailing list
[hidden email]
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

[Bug 7284] Authority matching algorithm improvements

Bugzilla from bugzilla-daemon@bugs.koha-community.org
In reply to this post by Bugzilla from bugzilla-daemon@bugs.koha-community.org
http://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=7284

--- Comment #2 from Ian Walls <[hidden email]> 2011-12-07 21:45:33 UTC ---
I'd recommend an even more robust linker algorithm in the case of multiple
headings... perhaps something to check completeness of the record (more fields
filled in is 'better'), or how many other records link to it (more popular is
'better').  Taking the first result would be easier, but not necessarily always
the best for the cataloger.  Perhaps making this syspref-controlled?

--
Configure bugmail: http://bugs.koha-community.org/bugzilla3/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching all bug changes.
_______________________________________________
Koha-bugs mailing list
[hidden email]
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

[Bug 7284] Authority matching algorithm improvements

Bugzilla from bugzilla-daemon@bugs.koha-community.org
In reply to this post by Bugzilla from bugzilla-daemon@bugs.koha-community.org
http://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=7284

[hidden email] changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |[hidden email]

--
Configure bugmail: http://bugs.koha-community.org/bugzilla3/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching all bug changes.
_______________________________________________
Koha-bugs mailing list
[hidden email]
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

[Bug 7284] Authority matching algorithm improvements

Bugzilla from bugzilla-daemon@bugs.koha-community.org
In reply to this post by Bugzilla from bugzilla-daemon@bugs.koha-community.org
http://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=7284

--- Comment #3 from [hidden email] 2011-12-09 15:08:01 UTC ---
Corporate headings are a big issue for us and this enhancement would be useful.

For testing purposes, examples from our catalog for authorities:
100 1# $aWelty, Eudora$d1909-2001$xCharacters$xPoor
110 1# $aUnited States.$bWork Projects Administration.$bService Division.$bWar
Services Program
110 1# $aConfederate States of America.$bArmy.$bTexas Brigade
150 #7 $aSpectators$zMississippi$zJackson$y1960-1970$2lctgm
150 ## $aAmerican literature$vBibliography$vCatalogs
150 ## $aChoctaw Indians$zMississippi$xAntiquities$vCatalogs

to bibliographic records:
600 10 $$aWelty, Eudora$d1909-2001$xCharacters$xPoor.  
610 1# $aUnited States.$bWork Projects Administration.$bService Division.$bWar
Services Program.
610 10 $aConfederate States of America.$b.Army.$bTexas Brigade.
650 #7 $aSpectators$zMississippi$zJackson$y1960-1970$2lctgm.
650 #0 $aAmerican literature$vBibliography$vCatalogs.
650 #0 $aChoctaw Indians$zMississippi$xAntiquities$xCatalogs.

--
Configure bugmail: http://bugs.koha-community.org/bugzilla3/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching all bug changes.
_______________________________________________
Koha-bugs mailing list
[hidden email]
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

[Bug 7284] Authority matching algorithm improvements

Bugzilla from bugzilla-daemon@bugs.koha-community.org
In reply to this post by Bugzilla from bugzilla-daemon@bugs.koha-community.org
http://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=7284

Jared Camins-Esakov <[hidden email]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |ASSIGNED
  Change sponsored?|Seeking cosponsors          |Sponsored

--
Configure bugmail: http://bugs.koha-community.org/bugzilla3/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching all bug changes.
_______________________________________________
Koha-bugs mailing list
[hidden email]
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

[Bug 7284] Authority matching algorithm improvements

Bugzilla from bugzilla-daemon@bugs.koha-community.org
In reply to this post by Bugzilla from bugzilla-daemon@bugs.koha-community.org
http://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=7284

Jane Wagner <[hidden email]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |[hidden email]

--
Configure bugmail: http://bugs.koha-community.org/bugzilla3/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching all bug changes.
_______________________________________________
Koha-bugs mailing list
[hidden email]
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

[Bug 7284] Authority matching algorithm improvements

Bugzilla from bugzilla-daemon@bugs.koha-community.org
In reply to this post by Bugzilla from bugzilla-daemon@bugs.koha-community.org
http://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=7284

Jared Camins-Esakov <[hidden email]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Blocks|                            |5454

--
Configure bugmail: http://bugs.koha-community.org/bugzilla3/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching all bug changes.
_______________________________________________
Koha-bugs mailing list
[hidden email]
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

[Bug 7284] Authority matching algorithm improvements

Bugzilla from bugzilla-daemon@bugs.koha-community.org
In reply to this post by Bugzilla from bugzilla-daemon@bugs.koha-community.org
http://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=7284

--- Comment #4 from Jared Camins-Esakov <[hidden email]> 2012-01-07 22:32:58 UTC ---
Created attachment 7083
  --> http://bugs.koha-community.org/bugzilla3/attachment.cgi?id=7083
Bug 7284: Clean up authority code and add tests

Cleaned up authorities code by removing unused functions and adding
unimplemented functions and added some unit tests.

--
Configure bugmail: http://bugs.koha-community.org/bugzilla3/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching all bug changes.
_______________________________________________
Koha-bugs mailing list
[hidden email]
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

[Bug 7284] Authority matching algorithm improvements

Bugzilla from bugzilla-daemon@bugs.koha-community.org
In reply to this post by Bugzilla from bugzilla-daemon@bugs.koha-community.org
http://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=7284

--- Comment #5 from Jared Camins-Esakov <[hidden email]> 2012-01-07 22:33:08 UTC ---
Created attachment 7084
  --> http://bugs.koha-community.org/bugzilla3/attachment.cgi?id=7084
Bug 7284: Add heading match option to auth plugin

Added an additional box to the authority finder plugin for "Heading match,"
which consults not just the main entry but also See-from and See-also-from
headings.

--
Configure bugmail: http://bugs.koha-community.org/bugzilla3/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching all bug changes.
_______________________________________________
Koha-bugs mailing list
[hidden email]
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

[Bug 7284] Authority matching algorithm improvements

Bugzilla from bugzilla-daemon@bugs.koha-community.org
In reply to this post by Bugzilla from bugzilla-daemon@bugs.koha-community.org
http://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=7284

--- Comment #6 from Jared Camins-Esakov <[hidden email]> 2012-01-07 22:33:23 UTC ---
Created attachment 7085
  --> http://bugs.koha-community.org/bugzilla3/attachment.cgi?id=7085
Bug 7284: Improvements to authority linker script

This commit improves Koha's authority linker cron job
(misc/link_bibs_to_authorities.pl) to make it more useful:

Added the following options to the misc/link_bibs_to_authorities.pl script:
--auth-limit        Only process those headings that match the authorities
                    matching the user-specified WHERE clause.
--bib-limit         Only process those bib records that match the
                    user-specified WHERE clause.
--commit            Commit the results to the database after every N records
                    are processed.
--link-report       Display a report of all the headings that were processed.

Converted misc/link_bibs_to_authorities.pl to use POD.

Added a detailed report of headings that linked, did not link, and linked
in a "fuzzy" fashion (the exact semantics of fuzzy are up to the individual
linker modules) during the run.

Implemented new C4::Linker functionality to make it possible to easily add
custom authority linker algorithms. Currently available linker options are:
* Default: retains the current behavior of only creating links when there is
  an exact match to one and only one authority record
* First Match: based on Default, creates a link to the *first* authority
  record that matches a given heading, even if there is more than one
  authority record that matches
* Last Match: based on Default, creates a link to the *last* authority
  record that matches a given heading, even if there is more than one record
  that matches

Made the linking functionality use the SearchAuthorities in C4::AuthoritiesMarc
rather than SimpleSearch in C4::Search. Once C4::Search has been refactored,
SearchAuthorities should be rewritten to simply call into C4::Search. However,
at this time C4::Search cannot handle authority searching. Also fixed numerous
performance issues in SearchAuthorities and the Linker script:
* Correctly destroy ZOOM recordsets in SearchAuthorities when finished. If left
  undestroyed, efficiency appears to approach O(log n^n)
* Add an optional $skipmetadata flag to SearchAuthorities that can be used to
  avoid additional calls into Zebra when all that is wanted are authority
  records and not statistics about their use

This patch also adds the following sysprefs:
* AutoCreateAuthorities - When this and BiblioAddsAuthorities are both turned
  on, automatically create authority records for headings that don't have
  any authority link when cataloging. When BiblioAddsAuthorities is on and
  AutoCreateAuthorities is turned off, do not automatically generate authority
  records, but allow the user to enter headings that don't match an existing
  authority. When BiblioAddsAuthorities is off, this has no effect.
* LinkerModule - Chooses which linker module to use for matching headings
  (current options are as described above in the section on linker options:
  "Default," "FirstMatch," and "LastMatch")
* LinkerRelink - When turned on, the linker will confirm the links for headings
  that have previously been linked to an authority record when it runs. When
  turned off, any heading with an existing link will be ignored.
* LinkerKeepStale - When turned on, the linker will never *delete* a link to an
  authority record, though, depending on the value of LinkerRelink, it may
  change the link.

This patch also modifies the authority indexing to remove trailing punctuation
from Match indexes.

--
Configure bugmail: http://bugs.koha-community.org/bugzilla3/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching all bug changes.
_______________________________________________
Koha-bugs mailing list
[hidden email]
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

[Bug 7284] Authority matching algorithm improvements

Bugzilla from bugzilla-daemon@bugs.koha-community.org
In reply to this post by Bugzilla from bugzilla-daemon@bugs.koha-community.org
http://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=7284

--- Comment #7 from Jared Camins-Esakov <[hidden email]> 2012-01-07 22:33:37 UTC ---
Created attachment 7086
  --> http://bugs.koha-community.org/bugzilla3/attachment.cgi?id=7086
Bug 7284: Use the new linking for cataloguing

Replace the old BiblioAddAuthorities subroutines with calls into the new
C4::Linker routines.

--
Configure bugmail: http://bugs.koha-community.org/bugzilla3/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching all bug changes.
_______________________________________________
Koha-bugs mailing list
[hidden email]
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

[Bug 7284] Authority matching algorithm improvements

Bugzilla from bugzilla-daemon@bugs.koha-community.org
In reply to this post by Bugzilla from bugzilla-daemon@bugs.koha-community.org
http://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=7284

--- Comment #8 from Jared Camins-Esakov <[hidden email]> 2012-01-07 22:33:45 UTC ---
Created attachment 7087
  --> http://bugs.koha-community.org/bugzilla3/attachment.cgi?id=7087
Bug 7284: Add UNIMARC handler for C4::Heading

Look in the framework configuration tables to figure out which tags should
be authority controlled for UNIMARC, and add a simple implementation for
C4::Heading::UNIMARC which will work with the GRS-1 indexing used by UNIMARC.

--
Configure bugmail: http://bugs.koha-community.org/bugzilla3/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching all bug changes.
_______________________________________________
Koha-bugs mailing list
[hidden email]
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

[Bug 7284] Authority matching algorithm improvements

Bugzilla from bugzilla-daemon@bugs.koha-community.org
In reply to this post by Bugzilla from bugzilla-daemon@bugs.koha-community.org
http://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=7284

Jared Camins-Esakov <[hidden email]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Priority|P5 - low                    |P2

--- Comment #9 from Jared Camins-Esakov <[hidden email]> 2012-01-07 22:36:57 UTC ---
The five patches that make up the improved authority linker are attached to
this bug, and the repository for ongoing work can be found at
https://github.com/jcamins/koha/commits/bug_7284_v2

Please test.

--
Configure bugmail: http://bugs.koha-community.org/bugzilla3/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching all bug changes.
_______________________________________________
Koha-bugs mailing list
[hidden email]
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

[Bug 7284] Authority matching algorithm improvements

Bugzilla from bugzilla-daemon@bugs.koha-community.org
In reply to this post by Bugzilla from bugzilla-daemon@bugs.koha-community.org
http://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=7284

Jared Camins-Esakov <[hidden email]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Priority|P2                          |PATCH-Sent
       Patch Status|---                         |Needs Signoff

--
Configure bugmail: http://bugs.koha-community.org/bugzilla3/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching all bug changes.
_______________________________________________
Koha-bugs mailing list
[hidden email]
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

[Bug 7284] Authority matching algorithm improvements

Bugzilla from bugzilla-daemon@bugs.koha-community.org
In reply to this post by Bugzilla from bugzilla-daemon@bugs.koha-community.org
http://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=7284

Jared Camins-Esakov <[hidden email]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Blocks|                            |7417

--
Configure bugmail: http://bugs.koha-community.org/bugzilla3/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching all bug changes.
_______________________________________________
Koha-bugs mailing list
[hidden email]
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

[Bug 7284] Authority matching algorithm improvements

Bugzilla from bugzilla-daemon@bugs.koha-community.org
In reply to this post by Bugzilla from bugzilla-daemon@bugs.koha-community.org
http://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=7284

Jared Camins-Esakov <[hidden email]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Blocks|                            |7418

--
Configure bugmail: http://bugs.koha-community.org/bugzilla3/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching all bug changes.
_______________________________________________
Koha-bugs mailing list
[hidden email]
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

[Bug 7284] Authority matching algorithm improvements

Bugzilla from bugzilla-daemon@bugs.koha-community.org
In reply to this post by Bugzilla from bugzilla-daemon@bugs.koha-community.org
http://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=7284

Jared Camins-Esakov <[hidden email]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Blocks|                            |7419

--
Configure bugmail: http://bugs.koha-community.org/bugzilla3/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching all bug changes.
_______________________________________________
Koha-bugs mailing list
[hidden email]
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

[Bug 7284] Authority matching algorithm improvements

Bugzilla from bugzilla-daemon@bugs.koha-community.org
In reply to this post by Bugzilla from bugzilla-daemon@bugs.koha-community.org
http://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=7284

Frédéric Demians <[hidden email]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         Depends on|                            |7421

--
Configure bugmail: http://bugs.koha-community.org/bugzilla3/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching all bug changes.
_______________________________________________
Koha-bugs mailing list
[hidden email]
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

[Bug 7284] Authority matching algorithm improvements

Bugzilla from bugzilla-daemon@bugs.koha-community.org
In reply to this post by Bugzilla from bugzilla-daemon@bugs.koha-community.org
http://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=7284

Frédéric Demians <[hidden email]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Attachment #7087|0                           |1
        is obsolete|                            |

--- Comment #10 from Frédéric Demians <[hidden email]> 2012-01-09 11:56:18 UTC ---
Created attachment 7091
  --> http://bugs.koha-community.org/bugzilla3/attachment.cgi?id=7091
Bug 7284: Add UNIMARC handler for C4::Heading (modified)

Modification of Jared work on UNIMARC support.  Correct
C4::Heading::UNIMARC class loading. Create biblio tag to authority types
data structure at initialization rather thant querying DB.

Jared: This patch modifying your, you may have to revert your and then apply
mine on you devel branch.

--
Configure bugmail: http://bugs.koha-community.org/bugzilla3/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching all bug changes.
_______________________________________________
Koha-bugs mailing list
[hidden email]
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

[Bug 7284] Authority matching algorithm improvements

Bugzilla from bugzilla-daemon@bugs.koha-community.org
In reply to this post by Bugzilla from bugzilla-daemon@bugs.koha-community.org
http://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=7284

Frédéric Demians <[hidden email]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |[hidden email]

--- Comment #11 from Frédéric Demians <[hidden email]> 2012-01-09 12:08:01 UTC ---
(In reply to comment #2)
> I'd recommend an even more robust linker algorithm in the case of
> multiple headings... perhaps something to check completeness of the
> record (more fields filled in is 'better'), or how many other records
> link to it (more popular is 'better').  Taking the first result would
> be easier, but not necessarily always the best for the cataloger.
> Perhaps making this syspref-controlled?

There may be more than a unique heading match because the matching is
done on a global Zebra authority index. We could, for example, search
specifically on Zebra Personal-Name-Heading rather than on Heading-Main.

--
Configure bugmail: http://bugs.koha-community.org/bugzilla3/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching all bug changes.
_______________________________________________
Koha-bugs mailing list
[hidden email]
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

[Bug 7284] Authority matching algorithm improvements

Bugzilla from bugzilla-daemon@bugs.koha-community.org
In reply to this post by Bugzilla from bugzilla-daemon@bugs.koha-community.org
http://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=7284

--- Comment #12 from Jared Camins-Esakov <[hidden email]> 2012-01-11 19:18:59 UTC ---
(In reply to comment #11)
> There may be more than a unique heading match because the matching is
> done on a global Zebra authority index. We could, for example, search
> specifically on Zebra Personal-Name-Heading rather than on Heading-Main.

The benefit of the architecture I chose for the Linker is that it is easy to
implement a new linker that does as granular matching as one could wish. From
my point of view, using the more specific indexes greatly increases the
complexity for minimal benefit. Patches adding a C4::Linker::PreciseIndex
linker module are gratefully accepted.

--
Configure bugmail: http://bugs.koha-community.org/bugzilla3/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching all bug changes.
_______________________________________________
Koha-bugs mailing list
[hidden email]
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

[Bug 7284] Authority matching algorithm improvements

Bugzilla from bugzilla-daemon@bugs.koha-community.org
In reply to this post by Bugzilla from bugzilla-daemon@bugs.koha-community.org
http://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=7284

Jared Camins-Esakov <[hidden email]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Priority|PATCH-Sent                  |P3
           Severity|enhancement                 |major

--
Configure bugmail: http://bugs.koha-community.org/bugzilla3/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching all bug changes.
_______________________________________________
Koha-bugs mailing list
[hidden email]
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

[Bug 7284] Authority matching algorithm improvements

Bugzilla from bugzilla-daemon@bugs.koha-community.org
In reply to this post by Bugzilla from bugzilla-daemon@bugs.koha-community.org
http://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=7284

Jared Camins-Esakov <[hidden email]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Attachment #7083|0                           |1
        is obsolete|                            |
   Attachment #7084|0                           |1
        is obsolete|                            |
   Attachment #7085|0                           |1
        is obsolete|                            |
   Attachment #7086|0                           |1
        is obsolete|                            |
   Attachment #7091|0                           |1
        is obsolete|                            |

--- Comment #13 from Jared Camins-Esakov <[hidden email]> 2012-01-21 15:00:19 UTC ---
Created attachment 7266
  --> http://bugs.koha-community.org/bugzilla3/attachment.cgi?id=7266
Bug 7284: Authority matching improvements

Squashed patch incorporating all previous patches.

1. Cleaned up authorities code by removing unused functions and adding
unimplemented functions and added some unit tests.

2. Added an additional box to the authority finder plugin for "Heading match,"
which consults not just the main entry but also See-from and See-also-from
headings.

3. Improved Koha's authority linker cron job (misc/link_bibs_to_authorities.pl)
to make it more useful:

Added the following options to the misc/link_bibs_to_authorities.pl script:
--auth-limit        Only process those headings that match the authorities
                    matching the user-specified WHERE clause.
--bib-limit         Only process those bib records that match the
                    user-specified WHERE clause.
--commit            Commit the results to the database after every N records
                    are processed.
--link-report       Display a report of all the headings that were processed.

Converted misc/link_bibs_to_authorities.pl to use POD.

Added a detailed report of headings that linked, did not link, and linked
in a "fuzzy" fashion (the exact semantics of fuzzy are up to the individual
linker modules) during the run.

Implemented new C4::Linker functionality to make it possible to easily add
custom authority linker algorithms. Currently available linker options are:
* Default: retains the current behavior of only creating links when there is
  an exact match to one and only one authority record; if the
'broader_headings'
  option is enabled, it will try to link to headings to authority records for
  broader headings by removing subfields from the end of the heading (NOTE:
  test the results before enabling broader_headings in a production system
  because its usefulness is very much dependent on individual sites' authority
  files)
* First Match: based on Default, creates a link to the *first* authority
  record that matches a given heading, even if there is more than one
  authority record that matches
* Last Match: based on Default, creates a link to the *last* authority
  record that matches a given heading, even if there is more than one record
  that matches

Made the linking functionality use the SearchAuthorities in C4::AuthoritiesMarc
rather than SimpleSearch in C4::Search. Once C4::Search has been refactored,
SearchAuthorities should be rewritten to simply call into C4::Search. However,
at this time C4::Search cannot handle authority searching. Also fixed numerous
performance issues in SearchAuthorities and the Linker script:
* Correctly destroy ZOOM recordsets in SearchAuthorities when finished. If left
  undestroyed, efficiency appears to approach O(log n^n)
* Add an optional $skipmetadata flag to SearchAuthorities that can be used to
  avoid additional calls into Zebra when all that is wanted are authority
  records and not statistics about their use

This patch also adds the following sysprefs:
* AutoCreateAuthorities - When this and BiblioAddsAuthorities are both turned
  on, automatically create authority records for headings that don't have
  any authority link when cataloging. When BiblioAddsAuthorities is on and
  AutoCreateAuthorities is turned off, do not automatically generate authority
  records, but allow the user to enter headings that don't match an existing
  authority. When BiblioAddsAuthorities is off, this has no effect.
* LinkerModule - Chooses which linker module to use for matching headings
  (current options are as described above in the section on linker options:
  "Default," "FirstMatch," and "LastMatch")
* LinkerOptions - A pipe-separated list of options to set for the authority
  linker
* LinkerRelink - When turned on, the linker will confirm the links for headings
  that have previously been linked to an authority record when it runs. When
  turned off, any heading with an existing link will be ignored.
* LinkerKeepStale - When turned on, the linker will never *delete* a link to an
  authority record, though, depending on the value of LinkerRelink, it may
  change the link.

This patch also modifies the authority indexing to remove trailing punctuation
from Match indexes.

4. Replace the old BiblioAddAuthorities subroutines with calls into the new
C4::Linker routines.

5. Add a simple implementation for C4::Heading::UNIMARC. (With thanks to F.
Demians, 2011.01.09:) Correct C4::Heading::UNIMARC class loading. Create biblio
tag to authority types data structure at initialization rather than querying
DB.

6. Ran perltidy on all changed code.

Signed-off-by: Jared Camins-Esakov <[hidden email]>

--
Configure bugmail: http://bugs.koha-community.org/bugzilla3/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching all bug changes.
_______________________________________________
Koha-bugs mailing list
[hidden email]
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

[Bug 7284] Authority matching algorithm improvements

Bugzilla from bugzilla-daemon@bugs.koha-community.org
In reply to this post by Bugzilla from bugzilla-daemon@bugs.koha-community.org
http://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=7284

Jared Camins-Esakov <[hidden email]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Attachment #7266|0                           |1
        is obsolete|                            |

--- Comment #14 from Jared Camins-Esakov <[hidden email]> 2012-01-24 17:09:04 UTC ---
Created attachment 7323
  --> http://bugs.koha-community.org/bugzilla3/attachment.cgi?id=7323
Bug 7284: Authority matching improvements

Squashed patch incorporating all previous patches.

1. Cleaned up authorities code by removing unused functions and adding
unimplemented functions and added some unit tests.

2. Added an additional box to the authority finder plugin for "Heading match,"
which consults not just the main entry but also See-from and See-also-from
headings.

3. Improved Koha's authority linker cron job (misc/link_bibs_to_authorities.pl)
to make it more useful:

Added the following options to the misc/link_bibs_to_authorities.pl script:
--auth-limit        Only process those headings that match the authorities
                    matching the user-specified WHERE clause.
--bib-limit         Only process those bib records that match the
                    user-specified WHERE clause.
--commit            Commit the results to the database after every N records
                    are processed.
--link-report       Display a report of all the headings that were processed.

Converted misc/link_bibs_to_authorities.pl to use POD.

Added a detailed report of headings that linked, did not link, and linked
in a "fuzzy" fashion (the exact semantics of fuzzy are up to the individual
linker modules) during the run.

Implemented new C4::Linker functionality to make it possible to easily add
custom authority linker algorithms. Currently available linker options are:
* Default: retains the current behavior of only creating links when there is
  an exact match to one and only one authority record; if the
'broader_headings'
  option is enabled, it will try to link to headings to authority records for
  broader headings by removing subfields from the end of the heading (NOTE:
  test the results before enabling broader_headings in a production system
  because its usefulness is very much dependent on individual sites' authority
  files)
* First Match: based on Default, creates a link to the *first* authority
  record that matches a given heading, even if there is more than one
  authority record that matches
* Last Match: based on Default, creates a link to the *last* authority
  record that matches a given heading, even if there is more than one record
  that matches

Made the linking functionality use the SearchAuthorities in C4::AuthoritiesMarc
rather than SimpleSearch in C4::Search. Once C4::Search has been refactored,
SearchAuthorities should be rewritten to simply call into C4::Search. However,
at this time C4::Search cannot handle authority searching. Also fixed numerous
performance issues in SearchAuthorities and the Linker script:
* Correctly destroy ZOOM recordsets in SearchAuthorities when finished. If left
  undestroyed, efficiency appears to approach O(log n^n)
* Add an optional $skipmetadata flag to SearchAuthorities that can be used to
  avoid additional calls into Zebra when all that is wanted are authority
  records and not statistics about their use

This patch also adds the following sysprefs:
* AutoCreateAuthorities - When this and BiblioAddsAuthorities are both turned
  on, automatically create authority records for headings that don't have
  any authority link when cataloging. When BiblioAddsAuthorities is on and
  AutoCreateAuthorities is turned off, do not automatically generate authority
  records, but allow the user to enter headings that don't match an existing
  authority. When BiblioAddsAuthorities is off, this has no effect.
* LinkerModule - Chooses which linker module to use for matching headings
  (current options are as described above in the section on linker options:
  "Default," "FirstMatch," and "LastMatch")
* LinkerOptions - A pipe-separated list of options to set for the authority
  linker
* LinkerRelink - When turned on, the linker will confirm the links for headings
  that have previously been linked to an authority record when it runs. When
  turned off, any heading with an existing link will be ignored.
* LinkerKeepStale - When turned on, the linker will never *delete* a link to an
  authority record, though, depending on the value of LinkerRelink, it may
  change the link.

This patch also modifies the authority indexing to remove trailing punctuation
from Match indexes.

4. Replace the old BiblioAddAuthorities subroutines with calls into the new
C4::Linker routines.

5. Add a simple implementation for C4::Heading::UNIMARC. (With thanks to F.
Demians, 2011.01.09:) Correct C4::Heading::UNIMARC class loading. Create biblio
tag to authority types data structure at initialization rather than querying
DB.

6. Ran perltidy on all changed code.

Signed-off-by: Jared Camins-Esakov <[hidden email]>

--
Configure bugmail: http://bugs.koha-community.org/bugzilla3/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching all bug changes.
_______________________________________________
Koha-bugs mailing list
[hidden email]
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

[Bug 7284] Authority matching algorithm improvements

Bugzilla from bugzilla-daemon@bugs.koha-community.org
In reply to this post by Bugzilla from bugzilla-daemon@bugs.koha-community.org
http://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=7284

Jared Camins-Esakov <[hidden email]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Blocks|                            |7475

--
Configure bugmail: http://bugs.koha-community.org/bugzilla3/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching all bug changes.
_______________________________________________
Koha-bugs mailing list
[hidden email]
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

[Bug 7284] Authority matching algorithm improvements

Bugzilla from bugzilla-daemon@bugs.koha-community.org
In reply to this post by Bugzilla from bugzilla-daemon@bugs.koha-community.org
http://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=7284

Jared Camins-Esakov <[hidden email]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Attachment #7323|0                           |1
        is obsolete|                            |

--- Comment #15 from Jared Camins-Esakov <[hidden email]> 2012-02-04 02:09:02 UTC ---
Created attachment 7457
  --> http://bugs.koha-community.org/bugzilla3/attachment.cgi?id=7457
Bug 7284: Authority matching improvements

Squashed patch incorporating all previous patches.

1. Cleaned up authorities code by removing unused functions and adding
unimplemented functions and added some unit tests.

2. Added an additional box to the authority finder plugin for "Heading match,"
which consults not just the main entry but also See-from and See-also-from
headings.

3. Improved Koha's authority linker cron job (misc/link_bibs_to_authorities.pl)
to make it more useful:

Added the following options to the misc/link_bibs_to_authorities.pl script:
--auth-limit        Only process those headings that match the authorities
                    matching the user-specified WHERE clause.
--bib-limit         Only process those bib records that match the
                    user-specified WHERE clause.
--commit            Commit the results to the database after every N records
                    are processed.
--link-report       Display a report of all the headings that were processed.

Converted misc/link_bibs_to_authorities.pl to use POD.

Added a detailed report of headings that linked, did not link, and linked
in a "fuzzy" fashion (the exact semantics of fuzzy are up to the individual
linker modules) during the run.

Implemented new C4::Linker functionality to make it possible to easily add
custom authority linker algorithms. Currently available linker options are:
* Default: retains the current behavior of only creating links when there is
  an exact match to one and only one authority record; if the
'broader_headings'
  option is enabled, it will try to link to headings to authority records for
  broader headings by removing subfields from the end of the heading (NOTE:
  test the results before enabling broader_headings in a production system
  because its usefulness is very much dependent on individual sites' authority
  files)
* First Match: based on Default, creates a link to the *first* authority
  record that matches a given heading, even if there is more than one
  authority record that matches
* Last Match: based on Default, creates a link to the *last* authority
  record that matches a given heading, even if there is more than one record
  that matches

Made the linking functionality use the SearchAuthorities in C4::AuthoritiesMarc
rather than SimpleSearch in C4::Search. Once C4::Search has been refactored,
SearchAuthorities should be rewritten to simply call into C4::Search. However,
at this time C4::Search cannot handle authority searching. Also fixed numerous
performance issues in SearchAuthorities and the Linker script:
* Correctly destroy ZOOM recordsets in SearchAuthorities when finished. If left
  undestroyed, efficiency appears to approach O(log n^n)
* Add an optional $skipmetadata flag to SearchAuthorities that can be used to
  avoid additional calls into Zebra when all that is wanted are authority
  records and not statistics about their use

This patch also adds the following sysprefs:
* AutoCreateAuthorities - When this and BiblioAddsAuthorities are both turned
  on, automatically create authority records for headings that don't have
  any authority link when cataloging. When BiblioAddsAuthorities is on and
  AutoCreateAuthorities is turned off, do not automatically generate authority
  records, but allow the user to enter headings that don't match an existing
  authority. When BiblioAddsAuthorities is off, this has no effect.
* LinkerModule - Chooses which linker module to use for matching headings
  (current options are as described above in the section on linker options:
  "Default," "FirstMatch," and "LastMatch")
* LinkerOptions - A pipe-separated list of options to set for the authority
  linker
* LinkerRelink - When turned on, the linker will confirm the links for headings
  that have previously been linked to an authority record when it runs. When
  turned off, any heading with an existing link will be ignored.
* LinkerKeepStale - When turned on, the linker will never *delete* a link to an
  authority record, though, depending on the value of LinkerRelink, it may
  change the link.

This patch also modifies the authority indexing to remove trailing punctuation
from Match indexes.

4. Replace the old BiblioAddAuthorities subroutines with calls into the new
C4::Linker routines.

5. Add a simple implementation for C4::Heading::UNIMARC. (With thanks to F.
Demians, 2011.01.09:) Correct C4::Heading::UNIMARC class loading. Create biblio
tag to authority types data structure at initialization rather than querying
DB.

6. Ran perltidy on all changed code.

Signed-off-by: Jared Camins-Esakov <[hidden email]>
Rebased on latest master, 3 February 2012

--
Configure bugmail: http://bugs.koha-community.org/bugzilla3/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching all bug changes.
_______________________________________________
Koha-bugs mailing list
[hidden email]
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

[Bug 7284] Authority matching algorithm improvements

Bugzilla from bugzilla-daemon@bugs.koha-community.org
In reply to this post by Bugzilla from bugzilla-daemon@bugs.koha-community.org
http://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=7284

--- Comment #16 from Jared Camins-Esakov <[hidden email]> 2012-02-04 03:25:27 UTC ---
TESTING PLAN

Note: all of these tests require that you have some authority records,
preferably for headings that actually appear in your bibliographic data. At
least one authority record must contain a "see from" reference (remember which
one contains this, as you'll need it for some of the tests).

=== Setting up the patch ===

1. Run installer/data/mysql/atomicupdate/bug_7284_authority_linking_pt1
2. Make sure you install the following files:
etc/zeradb/authorities/etc/bib1.att,
etc/zebradb/marc_defs/marc21/authorities/authority-koha-indexdefs.xml,
etc/zebradb/marc_defs/marc21/authorities/authority-zebra-indexdefs.xsl,
etc/zebradb/marc_defs/marc21/authorities/koha-indexdefs-to-zebra.xsl, and
etc/zebradb/marc_defs/unimarc/authorities/record.abs
3. Run misc/migration_tools/rebuild_zebra.pl -a -r

=== Testing the Heading match in the cataloging plugin ===

1. Create a new record, and open the cataloging plugin for an
authority-controlled field.
2. Search for an authority by entering the "see from" term in the Heading Match
box
3. Confirm that the appropriate heading shows up
4. Search for an authority by entering the preferred heading into the Main
entry or Main entry ($a only) box (i.e., repeat the procedure you usually use
for cataloging, whatever that may be)
5. Confirm that the appropriate heading shows up

=== Testing the cataloging interface ===

1. Turn off BiblioAddsAuthorities
2. Confirm that you cannot enter text directly in an authority-controlled field
3. Confirm that if you search for a heading using the authority control plugin
the heading is inserted (note, however, that this patch does not AND IS NOT
INTENDED TO fix the bugs in the authority plugin with duplicate subfields;
those are wholly out of scope- this check is for regressions)
4. Turn on BiblioAddsAuthorities and AutoCreateAuthorities
5. Confirm that you can enter text directly into an authority-controlled field,
and if you enter a heading that doesn't currently have an authority record, an
authority record stub is automatically created, and the heading you entered
linked
6. Confirm that if you enter a heading with only a subfield $a that *matches*
an existing heading, the authid for that heading is inserted into subfield $9
7. Confirm that if you enter a heading with multiple subfields that *matches*
an existing heading, the authid for that heading is inserted into subfield $9
8. Turn on BiblioAddsAuthorities and turn off AutoCreateAuthorities
9. Confirm that you can enter text directly into an authority-controlled field,
and if you enter a heading that doesn't currently have an authority record, an
authority record stub is *not* created
10. Confirm that if you enter a heading with only a subfield $a that *matches*
an existing heading, the authid for that heading is inserted into subfield $9
11. Confirm that if you enter a heading with multiple subfields that *matches*
an existing heading, the authid for that heading is inserted into subfield $9

=== Testing link_bibs_to_authorities.pl ===

1. Set LinkerModule to "Default," turn on LinkerRelink and
BiblioAddsAuthorities, and turn AutoCreateAuthorities and LinkerKeepStale off
2. Edit one bib record so that an authority controlled field that has already
been linked (i.e. has data in $9) has a heading that does not match any
authority record in your database
3. Run misc/link_bibs_to_authorities.pl --link-report --verbose --test (you may
want to pipe the output into less or a file, as the result is quite a lot of
information)
4. Look over the report to see if the headings that you have authority records
for report being matched, that the heading you modified in step 2 is reported
as "unlinked," and confirm that no changes were actually made to the database
(to check this, look at the bib record you edited earlier, and check that the
authid in the field you edited hasn't changed)
5. Run misc/link_bibs_to_authorities.pl --link-report --verbose (you may want
to pipe the output into less or a file, as the result is quite a lot of
information)
6. Check that the heading you modified has been unlinked
7. Change the modified heading back to whatever it was, but don't use the
authority control plugin to populate $9
8. Run misc/link_bibs_to_authorities.pl --link-report --verbose
--bib-limit="biblionumber=${BIB}" (replacing ${BIB} with the biblionumber of
the record you've been editing)
9. Confirm that the heading has been linked to the correct authority record
10. Turn LinkerKeepStale on
11. Change that heading to something else
12. Run misc/link_bibs_to_authorities.pl --link-report --verbose
--bib-limit="biblionumber=${BIB}" (replacing ${BIB} with the biblionumber of
the record you've been editing)
13. Confirm that the $9 has not changed
14. Turn LinkerKeepStale off
15. Create two authorities with the same heading
16. Run misc/migration_tools/rebuild_zebra.pl -a -z
17. Enter that heading into the bibliographic record you are working with
18. Run misc/link_bibs_to_authorities.pl --link-report --verbose
--bib-limit="biblionumber=${BIB}" (replacing ${BIB} with the biblionumber of
the record you've been editing)
19. Confirm that the heading has not been linked
20. Change LinkerModule to "FirstMatch"
21. Run misc/link_bibs_to_authorities.pl --link-report --verbose
--bib-limit="biblionumber=${BIB}" (replacing ${BIB} with the biblionumber of
the record you've been editing)
22. Confirm that the heading has been linked to the first authority record it
matches
23. Change LinkerModule to "LastMatch"
24. Run misc/link_bibs_to_authorities.pl --link-report --verbose
--bib-limit="biblionumber=${BIB}" (replacing ${BIB} with the biblionumber of
the record you've been editing)
25. Confirm that the heading has been linked to the second authority record it
matches
26. Run misc/link_bibs_to_authorities.pl --link-report --verbose
--auth-limit="authid=${AUTH}" (replacing ${AUTH} with an authid)
27. Confirm that only that heading is displayed in the report, and only those
bibs with that heading have been changed

=== Conclusion ===

If all those things worked, good news! You're ready to sign off on the patch
for bug 7284.

--
Configure bugmail: http://bugs.koha-community.org/bugzilla3/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching all bug changes.
_______________________________________________
Koha-bugs mailing list
[hidden email]
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

[Bug 7284] Authority matching algorithm improvements

Bugzilla from bugzilla-daemon@bugs.koha-community.org
In reply to this post by Bugzilla from bugzilla-daemon@bugs.koha-community.org
http://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=7284

--- Comment #17 from Jared Camins-Esakov <[hidden email]> 2012-02-07 21:38:54 UTC ---
There is one feature that is not documented in the commit message:

Enter "broader_headings" in LinkerOptions. With this option, the linker will
try to match the following heading as follows:
=600  10$aCamins-Esakov, Jared$xCoin collections$vCatalogs$vEarly works to
1800.

First: Camins-Esakov, Jared--Coin collections--Catalogs--Early works to 1800
Next: Camins-Esakov, Jared--Coin collections--Catalogs
Next: Camins-Esakov, Jared--Coin collections
Next: Camins-Esakov, Jared (matches! if a previous attempt had matched, it
would not have tried this)

This is probably relevant only to MARC21 and LCSH, but could potentially be of
great use to libraries that make heavy use of floating subdivisions.

--
Configure bugmail: http://bugs.koha-community.org/bugzilla3/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching all bug changes.
_______________________________________________
Koha-bugs mailing list
[hidden email]
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

[Bug 7284] Authority matching algorithm improvements

Bugzilla from bugzilla-daemon@bugs.koha-community.org
In reply to this post by Bugzilla from bugzilla-daemon@bugs.koha-community.org
http://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=7284

--- Comment #18 from Jared Camins-Esakov <[hidden email]> 2012-02-10 00:29:57 UTC ---
Created attachment 7548
  --> http://bugs.koha-community.org/bugzilla3/attachment.cgi?id=7548
Bug 7284 follow up: Cataloging module relinking

With the first patch for bug 7284, the automatic authority linking will
actually
work properly in the cataloging module. As Owen pointed out while testing the
patch, though, longtime users of Koha will not be expecting that. In keeping
with the principles of least surprise and maximum configurability, this patch
makes it possible to disable authority relinking in the cataloging module only
(i.e. leaving it enabled for future runs of link_bibs_to_authorities.pl).

This patch addds the following syspref:
* CatalogModuleRelink - when turned on, the automatic linker will relink
  headings when a record is saved in the cataloging module when LinkerRelink
  is turned on, even if the headings were manually linked to a different
  authority by the cataloger. When turned off (the default), the automatic
  linker will not relink any headings that have already been linked when a
  record is saved.

Note that though the default behavior matches the current behavior of Koha,
it does not match the intended behavior. Libraries that want the intended
behavior rather than the current behavior will need to adjust the
CatalogModuleRelink syspref.

Be sure to run the atomicupdate file used by this patch if you are on a dev
system: installer/data/mysql/atomicupdate/bug_7284_authority_linking_pt2

To test this patch:
1.  Run installer/data/mysql/atomicupdate/bug_7284_authority_linking_pt2
2.  Set BiblioAddsAuthorities to "on."
3.  Default setting of CatalogModuleRelink is "off." Leave it like that.
4.  Create a record and link an authority record to an authorized field using
    the authority plugin.
5.  Save the record. Ensure that the heading is linked to the appropriate
    authority.
6.  Open the record. Change the heading manually to something else, leaving
    the link. Save the record.
7.  Ensure that the heading remains linked to that same authority.
8.  Change CatalogModuleRelink to "on."
9.  Open the record. Use the authority plugin to link that heading to the
    same authority record you did earlier.
10. Save the record. Ensure that the heading is linked to the appropriate
    authority.
11. Open the record. Change the heading manually to something else, leaving
    the link. Save the record.
12. Ensure that the heading is no longer linked to the old authority record.

--
Configure bugmail: http://bugs.koha-community.org/bugzilla3/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching all bug changes.
_______________________________________________
Koha-bugs mailing list
[hidden email]
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/
123
Loading...