bulk BiblioAddsAuthorities?

classic Classic list List threaded Threaded
17 messages Options
Reply | Threaded
Open this post in threaded view
|

bulk BiblioAddsAuthorities?

Peter Huerter
Hi All,

I would like  BiblioAddsAuthorities to work even for bulk imports.  It does not work when records are staged in Koha, nor when bulkmarcimport.pl is used.  It does work when one edits and saves a marc record and global preference BiblioAddsAuthorities is set to "on".

I found this same question posted to koha-dev in 2009 (with no response):
http://koha.1045719.n5.nabble.com/Bulkmarcimport-and-authorities-tp3065080p3065080.html

I also found an IRC conversation from 2008 on this topic (with no resolution):
http://stats.workbuffer.org/irclog/text.pl?channel=koha;date=2008-01-22

What I have been able to gather is that the relevant code is contained in BiblioAddAuthorities in cataloguing/addbiblio.pl.

#
# sub that tries to find authorities linked to the biblio
# the sub :
#   - search in the authority DB for the same authid (in $9 of the biblio)
#   - search in the authority DB for the same 001 (in $3 of the biblio in UNIMARC)
#   - search in the authority DB for the same values (exactly) (in all subfields of the biblio)
# if the authority is found, the biblio is modified accordingly to be connected to the authority.
# if the authority is not found, it's added, and the biblio is then modified to be connected to the authority.
#

This is what I want, but in bulk.

Is there already a script that calls BiblioAddAuthorities and processes all biblio records in bulk?

If not, can someone take a moment and give me any advice about how one would write such a script?  I am new to Koha.  An example script would be great.

We want to be able to manage our own Authorities for at least Authors in house.  Any help would be greatly appreciated.

Thanks,
Pete.

Volunteer with
The Archives and Collections Society
http://aandc.org/
~ Volunteer ~
Archives and Collections Society - http://www.aandc.org/
"Maritime heritage and history, preservation and conservation, research and education through the written word and the arts."
Reply | Threaded
Open this post in threaded view
|

Re: bulk BiblioAddsAuthorities?

Elaine Bradtke-3
What format (if any) are your authorities currently in? If you can
bulk import your authorities, you will be able to link to them.
We had to build ours from scratch before import.
To create our authorities records, we stripped the names out of the
appropriate fields in the bib records, fed them into a database,
de-duplicated and checked inconsistencies, adding notes and references
fields as needed. Then we converted them to MARC.
This produced a separate authority file that we uploaded and then we
ran link_bibs_to_authorities.pl
If you already have them in electronic format, you can skip several
steps in the process.

This doesn't really answer your question about how to have Koha create
new authorities with a bulk import. Just how we worked around the
problem.
Elaine

On Wed, Feb 9, 2011 at 5:55 PM, Peter Huerter <[hidden email]> wrote:

>
> Hi All,
>
> I would like  BiblioAddsAuthorities to work even for bulk imports.  It does
> not work when records are staged in Koha, nor when bulkmarcimport.pl is
> used.  It does work when one edits and saves a marc record and global
> preference BiblioAddsAuthorities is set to "on".
>
> I found this same question posted to koha-dev in 2009 (with no response):
> http://koha.1045719.n5.nabble.com/Bulkmarcimport-and-authorities-tp3065080p3065080.html
>
> I also found an IRC conversation from 2008 on this topic (with no
> resolution):
> http://stats.workbuffer.org/irclog/text.pl?channel=koha;date=2008-01-22
>
> What I have been able to gather is that the relevant code is contained in
> BiblioAddAuthorities in cataloguing/addbiblio.pl.
>
> #
> # sub that tries to find authorities linked to the biblio
> # the sub :
> #   - search in the authority DB for the same authid (in $9 of the biblio)
> #   - search in the authority DB for the same 001 (in $3 of the biblio in
> UNIMARC)
> #   - search in the authority DB for the same values (exactly) (in all
> subfields of the biblio)
> # if the authority is found, the biblio is modified accordingly to be
> connected to the authority.
> # if the authority is not found, it's added, and the biblio is then modified
> to be connected to the authority.
> #
>
> This is what I want, but in bulk.
>
> Is there already a script that calls BiblioAddAuthorities and processes all
> biblio records in bulk?
>
> If not, can someone take a moment and give me any advice about how one would
> write such a script?  I am new to Koha.  An example script would be great.
>
> We want to be able to manage our own Authorities for at least Authors in
> house.  Any help would be greatly appreciated.
>
> Thanks,
> Pete.
>
> Volunteer with
> The Archives and Collections Society
> http://aandc.org/
>
> --
> View this message in context: http://koha.1045719.n5.nabble.com/bulk-BiblioAddsAuthorities-tp3378000p3378000.html
> Sent from the Koha - Discuss mailing list archive at Nabble.com.
> _______________________________________________
> Koha mailing list  http://koha-community.org
> [hidden email]
> http://lists.katipo.co.nz/mailman/listinfo/koha
>



--
Elaine Bradtke
Data Wrangler
VWML
English Folk Dance and Song Society | http://www.efdss.org
Cecil Sharp House, 2 Regent's Park Road, London NW1 7AY
Tel    +44 (0) 20 7485 2206 ext 36
--------------------------------------------------------------------------
Registered Company No. 297142
Charity Registered in England and Wales No. 305999
---------------------------------------------------------------------------
"Writing about music is like dancing about architecture"
--Elvis Costello (Musician magazine No. 60 (October 1983), p. 52)
_______________________________________________
Koha mailing list  http://koha-community.org
[hidden email]
http://lists.katipo.co.nz/mailman/listinfo/koha
Reply | Threaded
Open this post in threaded view
|

Re: bulk BiblioAddsAuthorities?

Peter Huerter
Thanks Elaine.  I am in the process of trying your method.  

Have you had any problems using link_bibs_to_authorities.pl?  It seems to have received some mixed reviews and my initial experiment failed to link the new authorities to existing biblios.

E.g. http://koha.1045719.n5.nabble.com/koha-bibs-authorities-problem-tp3050690p3050691.html

"If you want a best result, you need to re-implement link_bibs_to_authorities.pl with a better matching algorithm"

Did you have to re-implement a better matching algorithm?

Thanks,
Pete.

(btw I tried to write a script to do what I wanted in my original post and found that it is not as easy as I thought.  I found that I could not rely on SimpleSearch (and zebra) to find a recently added Authority.  I decided to move on after many failed attempts at trying to fix this).

~ Volunteer ~
Archives and Collections Society - http://www.aandc.org/
"Maritime heritage and history, preservation and conservation, research and education through the written word and the arts."
Reply | Threaded
Open this post in threaded view
|

Re: bulk BiblioAddsAuthorities?

Elaine Bradtke-3
It worked for us after a couple problems were fixed with our data.
The link occurs in the subfield 9 (MARC 21) of the appropriate field
in the biblio. When the field is linked to the authority record, the
number in the authority record's 001 field will appear in $9 (but you
will probably only see the number in editing mode).  If your linking
doesn't work, check the MARC framework to make sure you've got a $9 in
that field.
We had a problem with linking the first couple tries. The problem was
in the 008 field in the authority records.  Position 14 and 15 must be
set to 'a' for the linking to work.
Also,  Koha will automatically create the 001 number in the authority
records.   It will overwrite anything you have in that field, though
my IT wizard (whom I've CC'd just in case he's got any other tricks up
his sleeve) managed to force it to accept our last  import with the
original numbers.

Hope this helps, I'm the librarian half of the team, so the serious IT
stuff is occasionally lost on me.

Elaine
On Fri, Feb 18, 2011 at 7:09 PM, Peter Huerter <[hidden email]> wrote:

>
> Thanks Elaine.  I am in the process of trying your method.
>
> Have you had any problems using link_bibs_to_authorities.pl?  It seems to
> have received some mixed reviews and my initial experiment failed to link
> the new authorities to existing biblios.
>
> E.g.
> http://koha.1045719.n5.nabble.com/koha-bibs-authorities-problem-tp3050690p3050691.html
>
> "If you want a best result, you need to re-implement
> link_bibs_to_authorities.pl with a better matching algorithm"
>
> Did you have to re-implement a better matching algorithm?
>
> Thanks,
> Pete.
>
> (btw I tried to write a script to do what I wanted in my original post and
> found that it is not as easy as I thought.  I found that I could not rely on
> SimpleSearch (and zebra) to find a recently added Authority.  I decided to
> move on after many failed attempts at trying to fix this).
>
>
> --
> View this message in context: http://koha.1045719.n5.nabble.com/bulk-BiblioAddsAuthorities-tp3378000p3391607.html
> Sent from the Koha - Discuss mailing list archive at Nabble.com.
> _______________________________________________
> Koha mailing list  http://koha-community.org
> [hidden email]
> http://lists.katipo.co.nz/mailman/listinfo/koha
>



--
Elaine Bradtke
Data Wrangler
VWML
English Folk Dance and Song Society | http://www.efdss.org
Cecil Sharp House, 2 Regent's Park Road, London NW1 7AY
Tel    +44 (0) 20 7485 2206 ext 36
--------------------------------------------------------------------------
Registered Company No. 297142
Charity Registered in England and Wales No. 305999
---------------------------------------------------------------------------
"Writing about music is like dancing about architecture"
--Elvis Costello (Musician magazine No. 60 (October 1983), p. 52)
_______________________________________________
Koha mailing list  http://koha-community.org
[hidden email]
http://lists.katipo.co.nz/mailman/listinfo/koha
Reply | Threaded
Open this post in threaded view
|

Re: bulk BiblioAddsAuthorities?

Peter Huerter
Thanks Elaine.  I'm still having trouble getting my auth records to link with my biblio records.

Our process is this:
1) Import biblios into koha (without any authority information).

2) Take a dedup't list of authors in a CSV file, and run it through MarcEdit to create a compiled .mrc file of authorities.  (I also add 040 info here, and define my own LDR, and 008 but nothing else)

.csv file:
Zumwalt, Elmo R. Jr
Zimmerman, Linda

.mrk file:
=LDR  00000nz  a2200000o  4500
=008  110225000000|||a||||aa|||||||||||||||||d||||||
=040  \\$aOPIACS$bENG$cOPIACS
=100  \\$aZumwalt, Elmo R. Jr

=LDR  00000nz  a2200000o  4500
=008  110225000000|||a||||aa|||||||||||||||||d||||||
=040  \\$aOPIACS$bENG$cOPIACS
=100  \\$aZimmerman, Linda

3) Import the compiled authority list into Koha using bulkmarcimport, and re-index Zebra.
perl bulkmarcimport.pl -a -file /home/paul/first_authorities2.mrc -match=pn,100a -v 2
perl rebuild_zebra.pl -a -r -v

4) Run the linking script.
perl /usr/share/koha/bin/link_bibs_to_authorities.pl --verbose

The authority records are successfully imported into Koha (I can do an authority search and find them), but each search result report that 0 biblios are linked with "this" authority.

The linking script appears to do the opposite of what we want.  It actually removes authority links that are already present (I added an authority before running the linking script.  I'm able to add the link by editing a given biblio in Koha.  Koha adds it automatically - See my original post on this thread).  Existing links are gone (reporting "0 biblios"), and no new links are added.

I am using the following LDR, and 008 fields:

LDR:
00000nz  a2200000o  4500

008:
000000|||a||||aa|||||||||||||||||d||||||

(I'm sure there are bibs to be linked with these authorities.)

Any ideas?  Do you use MarcEdit?  Would you be able to provide a sample MARC record for one of your authorities please?

Thanks again,
Pete.
~ Volunteer ~
Archives and Collections Society - http://www.aandc.org/
"Maritime heritage and history, preservation and conservation, research and education through the written word and the arts."
Reply | Threaded
Open this post in threaded view
|

Re: bulk BiblioAddsAuthorities?

Stefano Bargioni
I have similar problems (Koha 323 on Debian 6).
In my opinion, Pete, your auth records are good, since they are imported and indexed correctly.
The fault seems to be in link_bibs_to_authorities.pl. Debugging it, I think that C4/Heading.pm adds too much query limiters when trying to find an authority entry starting from values stored in biblio fields. For instance, in the case of authors, the limiter
"AND Heading-use-main-or-added-entry=a"
introduces the error.
So I commented out lines 142-152 of C4/Heading.pm and every author was correctly linked to its own authority.
Up to now I don't know it this is a known bug or if Koha 325 solves it.
HTH.
Stefano

On Feb 28, 2011, at 16:52 , Peter Huerter wrote:

> Thanks Elaine.  I'm still having trouble getting my auth records to link with
> my biblio records.
>
> Our process is this:
> 1) Import biblios into koha (without any authority information).
>
> 2) Take a dedup't list of authors in a CSV file, and run it through MarcEdit
> to create a compiled .mrc file of authorities.  (I also add 040 info here,
> and define my own LDR, and 008 but nothing else)
>
> .csv file:
> Zumwalt, Elmo R. Jr
> Zimmerman, Linda
>
> .mrk file:
> =LDR  00000nz  a2200000o  4500
> =008  110225000000|||a||||aa|||||||||||||||||d||||||
> =040  \\$aOPIACS$bENG$cOPIACS
> =100  \\$aZumwalt, Elmo R. Jr
>
> =LDR  00000nz  a2200000o  4500
> =008  110225000000|||a||||aa|||||||||||||||||d||||||
> =040  \\$aOPIACS$bENG$cOPIACS
> =100  \\$aZimmerman, Linda
>
> 3) Import the compiled authority list into Koha using bulkmarcimport, and
> re-index Zebra.
> perl bulkmarcimport.pl -a -file /home/paul/first_authorities2.mrc
> -match=pn,100a -v 2
> perl rebuild_zebra.pl -a -r -v
>
> 4) Run the linking script.
> perl /usr/share/koha/bin/link_bibs_to_authorities.pl --verbose
>
> The authority records are successfully imported into Koha (I can do an
> authority search and find them), but each search result report that 0
> biblios are linked with "this" authority.
>
> The linking script appears to do the opposite of what we want.  It actually
> removes authority links that are already present (I added an authority
> before running the linking script.  I'm able to add the link by editing a
> given biblio in Koha.  Koha adds it automatically - See my original post on
> this thread).  Existing links are gone (reporting "0 biblios"), and no new
> links are added.
>
> I am using the following LDR, and 008 fields:
>
> LDR:
> 00000nz  a2200000o  4500
>
> 008:
> 000000|||a||||aa|||||||||||||||||d||||||
>
> (I'm sure there are bibs to be linked with these authorities.)
>
> Any ideas?  Do you use MarcEdit?  Would you be able to provide a sample MARC
> record for one of your authorities please?
>
> Thanks again,
> Pete.
>
> --
> View this message in context: http://koha.1045719.n5.nabble.com/bulk-BiblioAddsAuthorities-tp3378000p3403489.html
> Sent from the Koha - Discuss mailing list archive at Nabble.com.
> _______________________________________________
> Koha mailing list  http://koha-community.org
> [hidden email]
> http://lists.katipo.co.nz/mailman/listinfo/koha
>

_______________________________________________
Koha mailing list  http://koha-community.org
[hidden email]
http://lists.katipo.co.nz/mailman/listinfo/koha
Reply | Threaded
Open this post in threaded view
|

Re: bulk BiblioAddsAuthorities?

Peter Huerter
Thanks Stefano.

Unfortunately it's not working for me - yet :)

In your post you describe commenting out lines 142-152 of C4/Heading.pm.  In my Heading.pm file this would also comment out the return statement of _query_limiters.  Is your _query_limiters effectively reduced to the following?

sub _query_limiters {
    my $self = shift;
    my $limiters = " AND at='$self->{'auth_type'}'";
    return $limiters;
}


When I modify _query_limiters as above and rerun link_bibs_to_authorities.pl the linking still does not occur.  

Are you able to show me some raw MARC showing an authority linked with a biblio?  I understand basically that _9 subfields are added.., but what I am wondering is if my MARC format is somehow missing some important info (or at least our delta may provide some clues).

E.g. I am trying to link the following biblio with the following authority using link_bibs_to_authorities.pl:

LDR 00572nam a2200193Ia 4500
003     OPIACS
005     20110225150523.0
008     110224t20001998xx            000 0 und d
020    _a0964513331
040    _cOPIACS
100 1  _aZimmerman, Linda
245 10 _aGhosts of Rockland County
       _cZimmerman, Linda
250    _a2000
260    _bSpirited Books
       _c1998
       _g2000
300    _3pamphlet
650  7 _ashanties
       _2OPIACS
942    _2ddc
       _cAU
952    _40
       _xpamphlet, fine
       _esw
       _00
       _912234
       _bOPIACS
       _10
       _d2011-02-24
       _zindian rock,
       _8shanties
       _71
       _cshanties
       _g1.00
       _yBK
       _aOPIACS
999    _c12223
       _d12223

 
000 - LEADER
  @ 00229nz##a2200097o##4500
003 - CONTROL NUMBER IDENTIFIER
  @ OPIACS
005 - DATE AND TIME OF LATEST TRANSACTION
  @ 20110301130115.0
008 - FIXED-LENGTH DATA ELEMENTS
  @ 110301000000|ge|dz||aaan|||||||||||||||c|||||d
040 ## - CATALOGING SOURCE
  a Original cataloging OPIACS
  b Language of catalogi eng
  c Transcribing agency OPIACS
100 ## - HEADING--PERSONAL NAME
  a Personal name Zimmerman, Linda

Both of these records are in my database and are found in Koha.

A more general question.  You mention "debugging Koha".  How do you go about debugging Koha?  I've tried putting in debugging print statements printing to koha-error_log however I run into buffering issues(?) pretty quick and the output is not reliably.. outputted.

Thanks,
Pete.

Volunteer with the ACS http://aandc.org/
The guy the "tired old sys admin" leans on :)
~ Volunteer ~
Archives and Collections Society - http://www.aandc.org/
"Maritime heritage and history, preservation and conservation, research and education through the written word and the arts."
Reply | Threaded
Open this post in threaded view
|

Re: bulk BiblioAddsAuthorities?

Stefano Bargioni
On Mar 1, 2011, at 19:54 , Peter Huerter wrote:

> Thanks Stefano.
>
> Unfortunately it's not working for me - yet :)
>
> In your post you describe commenting out lines 142-152 of C4/Heading.pm.  In
> my Heading.pm file this would also comment out the return statement of
> _query_limiters.  Is your _query_limiters effectively reduced to the
> following?
>
> sub _query_limiters {
>    my $self = shift;
>    my $limiters = " AND at='$self->{'auth_type'}'";
>    return $limiters;
> }
>

Yes.

>
> When I modify _query_limiters as above and rerun link_bibs_to_authorities.pl
> the linking still does not occur.  
>
> Are you able to show me some raw MARC showing an authority linked with a
> biblio?  I understand basically that _9 subfields are added.., but what I am
> wondering is if my MARC format is somehow missing some important info (or at
> least our delta may provide some clues).
>
> E.g. I am trying to link the following biblio with the following authority
> using link_bibs_to_authorities.pl:
>
> LDR 00572nam a2200193Ia 4500
> 003     OPIACS
> 005     20110225150523.0
> 008     110224t20001998xx            000 0 und d
> 020    _a0964513331
> 040    _cOPIACS
> 100 1  _aZimmerman, Linda

link_bibs should add $9 to this tag, copying the 001 tag of the auth rec

> 245 10 _aGhosts of Rockland County
>       _cZimmerman, Linda
> 250    _a2000
> 260    _bSpirited Books
>       _c1998
>       _g2000
> 300    _3pamphlet
> 650  7 _ashanties
>       _2OPIACS
> 942    _2ddc
>       _cAU
> 952    _40
>       _xpamphlet, fine
>       _esw
>       _00
>       _912234
>       _bOPIACS
>       _10
>       _d2011-02-24
>       _zindian rock,
>       _8shanties
>       _71
>       _cshanties
>       _g1.00
>       _yBK
>       _aOPIACS
> 999    _c12223
>       _d12223
>
>
> 000 - LEADER
>  @ 00229nz##a2200097o##4500
> 003 - CONTROL NUMBER IDENTIFIER
>  @ OPIACS
> 005 - DATE AND TIME OF LATEST TRANSACTION
>  @ 20110301130115.0
> 008 - FIXED-LENGTH DATA ELEMENTS
>  @ 110301000000|ge|dz||aaan|||||||||||||||c|||||d
> 040 ## - CATALOGING SOURCE
>  a Original cataloging OPIACS
>  b Language of catalogi eng
>  c Transcribing agency OPIACS
> 100 ## - HEADING--PERSONAL NAME
>  a Personal name Zimmerman, Linda
>
> Both of these records are in my database and are found in Koha.
>
> A more general question.  You mention "debugging Koha".  How do you go about
> debugging Koha?

Well, I'm able to debug a perl script at a time. In this case I used
perl -d link_bibs_to_authority
and I followed its work step by step down to C4::Heading::_query_limiters where the query string is built.

Sorry: I don't understand why your attempt with the patched _query_limiters doesn't work...
Of course I assume you can find the auth rec searching for Perso_name Zimmerman, Linda, and that the result is 1.
HTH anyway. Stefano
_______________________________________________
Koha mailing list  http://koha-community.org
[hidden email]
http://lists.katipo.co.nz/mailman/listinfo/koha
Reply | Threaded
Open this post in threaded view
|

Re: bulk BiblioAddsAuthorities?

Peter Huerter
Grazie Stefano.  

[Off list you wrote: "Peter, please send me the marcxml field of your auth_header table relative to Zimmerman, Linda.  It can be more useful than the Koha display you attached at the bottom.  Thanks. Stefano"]

Here it is.  I think Koha adds the 003, and 942 fields.  I add the rest in my mapping.

|      8 | PERSO_NAME   | 2011-03-01  | NULL         | NULL       | NULL      | 00229nz##a2200097o##4500003000700000005001700007008004700024040002400071100002100095942001500116OPIACS20110301130115.0110301000000|ge|dz||aaan|||||||||||||||c|||||d  aOPIACSbengcOPIACS  aZimmerman, Linda  aPERSO_NAME    |   NULL | <?xml version="1.0" encoding="UTF-8"?>
<record
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="http://www.loc.gov/MARC21/slim http://www.loc.gov/standards/marcxml/schema/MARC21slim.xsd"
    xmlns="http://www.loc.gov/MARC21/slim">

  <leader>00229nz##a2200097o##4500</leader>
  <controlfield tag="003">OPIACS</controlfield>
  <controlfield tag="005">20110301130115.0</controlfield>
  <controlfield tag="008">110301000000|ge|dz||aaan|||||||||||||||c|||||d</controlfield>
  <datafield tag="040" ind1=" " ind2=" ">
    <subfield code="a">OPIACS</subfield>
    <subfield code="b">eng</subfield>
    <subfield code="c">OPIACS</subfield>
  </datafield>
  <datafield tag="100" ind1=" " ind2=" ">
    <subfield code="a">Zimmerman, Linda</subfield>
  </datafield>
  <datafield tag="942" ind1=" " ind2=" ">
    <subfield code="a">PERSO_NAME</subfield>
  </datafield>
</record>

Thanks.  I'll keep trying.

Cheers,
Pete.
~ Volunteer ~
Archives and Collections Society - http://www.aandc.org/
"Maritime heritage and history, preservation and conservation, research and education through the written word and the arts."
Reply | Threaded
Open this post in threaded view
|

Re: bulk BiblioAddsAuthorities?

Peter Huerter
In reply to this post by Peter Huerter
The XML in the last post is being automatically processed away, so here it is in a text file attachment:

mymarc.txt

Cheers,
Pete.
~ Volunteer ~
Archives and Collections Society - http://www.aandc.org/
"Maritime heritage and history, preservation and conservation, research and education through the written word and the arts."
Reply | Threaded
Open this post in threaded view
|

Re: bulk BiblioAddsAuthorities?

Stefano Bargioni
The main difference between your marcxml auth record and my records is the absence of the 001 controlfield.
In my opinion, this make impossibile to link your auth record with biblio ones.
Here I compare one of my records with yours (I used a trick to avoid stripping out of xml tags using ≤...≥ delimiters).
My record was constructed by a tool I prepared and than imported with bulkmarcimport.pl in a Koha 323.
To explain the absence of the 001, maybe we need to know which Koha version you are working on.
Regards. Stefano

≤?xml version="1.0" encoding="UTF-8"?≥
≤record
   xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
   xsi:schemaLocation="http://www.loc.gov/MARC21/slim http://www.loc.gov/standards/marcxml/schema/MARC21slim.xsd"
   xmlns="http://www.loc.gov/MARC21/slim"≥

 ≤leader≥00243nz  a2200121n  4500≤/leader≥
 ≤controlfield tag="001"≥65536≤/controlfield≥
 ≤controlfield tag="003"≥USC≤/controlfield≥
 ≤controlfield tag="005"≥20101109120000.0≤/controlfield≥
 ≤controlfield tag="008"≥101109|| aca||aabn           | a|a     d≤/controlfield≥
 ≤datafield tag="040" ind1=" " ind2=" "≥
   ≤subfield code="a"≥USC≤/subfield≥
 ≤/datafield≥
 ≤datafield tag="100" ind1="1" ind2=" "≥
   ≤subfield code="a"≥Sacchetta, Sergio.≤/subfield≥
 ≤/datafield≥
 ≤datafield tag="942" ind1=" " ind2=" "≥
   ≤subfield code="a"≥PERSO_NAME≤/subfield≥
 ≤/datafield≥
 ≤datafield tag="999" ind1=" " ind2=" "≥
   ≤subfield code="c"≥≤/subfield≥
   ≤subfield code="d"≥≤/subfield≥
 ≤/datafield≥
≤/record≥

≤record
   xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
   xsi:schemaLocation="http://www.loc.gov/MARC21/slim http://www.loc.gov/standards/marcxml/schema/MARC21slim.xsd"
   xmlns="http://www.loc.gov/MARC21/slim"≥

 ≤leader≥00229nz##a2200097o##4500≤/leader≥
 ≤controlfield tag="003"≥OPIACS≤/controlfield≥
 ≤controlfield tag="005"≥20110301130115.0≤/controlfield≥
 ≤controlfield tag="008"≥110301000000|ge|dz||aaan|||||||||||||||c|||||d≤/controlfield≥
 ≤datafield tag="040" ind1=" " ind2=" "≥
   ≤subfield code="a"≥OPIACS≤/subfield≥
   ≤subfield code="b"≥eng≤/subfield≥
   ≤subfield code="c"≥OPIACS≤/subfield≥
 ≤/datafield≥
 ≤datafield tag="100" ind1=" " ind2=" "≥
   ≤subfield code="a"≥Zimmerman, Linda≤/subfield≥
 ≤/datafield≥
 ≤datafield tag="942" ind1=" " ind2=" "≥
   ≤subfield code="a"≥PERSO_NAME≤/subfield≥
 ≤/datafield≥
≤/record≥
On Mar 3, 2011, at 16:17 , Peter Huerter wrote:

> The XML in the last post is being automatically processed away, so here it is
> in a text file attachment:
>
> http://koha.1045719.n5.nabble.com/file/n3408083/mymarc.txt mymarc.txt
>
> Cheers,
> Pete.
>
> --
> View this message in context: http://koha.1045719.n5.nabble.com/bulk-BiblioAddsAuthorities-tp3378000p3408083.html
> Sent from the Koha - Discuss mailing list archive at Nabble.com.
> _______________________________________________
> Koha mailing list  http://koha-community.org
> [hidden email]
> http://lists.katipo.co.nz/mailman/listinfo/koha
>

_______________________________________________
Koha mailing list  http://koha-community.org
[hidden email]
http://lists.katipo.co.nz/mailman/listinfo/koha
Reply | Threaded
Open this post in threaded view
|

Re: bulk BiblioAddsAuthorities?

Marilen Corciovei
In reply to this post by Peter Huerter
Hello Pete,

I don't know if you solved your problem yet but I had a similar problem
this weekend which I solved somehow. This is my solution. I wanted to
import both authorities and biblios and then link them. Note that I was
importing from a system which already had the link in the form of an id
stored in 200@6. First I imported authorities. I wrote a script which
parsed the marc for the authorities and stored the old id in a field
(900@6) before importing. Then, after the import I wrote another script
which parsed the marc for biblios, connected to the db to get the newid
(something like: select authid newid, marcxml from auth_
header where
extractvalue(marcxml,'//datafield[\@tag=900]/subfield[\@code=6]') =
'$aid'", actually I created a temp table to store the association for
speed). I stored the newauthid from koha in the 9 subfield of the auth
linked field. Upon import of biblios the link was present and working.
If you don't have the id you can probably still perform a per name search.

Regards,
Len
www.len.ro

On 03/03/2011 05:17 PM, Peter Huerter wrote:

> The XML in the last post is being automatically processed away, so here it is
> in a text file attachment:
>
> http://koha.1045719.n5.nabble.com/file/n3408083/mymarc.txt mymarc.txt
>
> Cheers,
> Pete.
>
> --
> View this message in context: http://koha.1045719.n5.nabble.com/bulk-BiblioAddsAuthorities-tp3378000p3408083.html
> Sent from the Koha - Discuss mailing list archive at Nabble.com.
> _______________________________________________
> Koha mailing list  http://koha-community.org
> [hidden email]
> http://lists.katipo.co.nz/mailman/listinfo/koha

_______________________________________________
Koha mailing list  http://koha-community.org
[hidden email]
http://lists.katipo.co.nz/mailman/listinfo/koha
Reply | Threaded
Open this post in threaded view
|

Re: bulk BiblioAddsAuthorities?

Stefano Bargioni
In reply to this post by Stefano Bargioni
Very strange that 001 in authority records is no autogetenerated.
Is this correct in Koha? Or is this a bug in Koha 325? In my Koha 323, bulkmarcimport -a generates 001.

Pete, maybe you can try a test: add manually some 001s editing the xml file, import, rebuild_zebra -a, link your bibs, rebuild_zebra -b, and search for authors related to 001s.
If this works, the problem is how to generate 001s.
Please, note that each auth record I prepared for import contains an empty 001
≤controlfield tag="001"≥≤/controlfield≥
Perhaps it triggers the generation of the value...
Stefano

On Mar 7, 2011, at 15:32 , pete huerter wrote:

Hi Stefano,

We are using 3.2-5. 

We have neither 001, nor 003 present in either biblio or auth records.  I am working on explicitly adding these to both biblio and auth records.  001, and 003 might be added automatically when a Marc record is edited, but not when bulkmarcimport is used.  I was really surprised to see that Koha is not automatically generating 001, and 003.

So basically I am going to add 001, and 003 info to both biblio and auth records, then try your method again.

Thanks for sticking with me :)

Pete.

On Fri, Mar 4, 2011 at 5:46 AM, Stefano Bargioni <[hidden email]> wrote:
The main difference between your marcxml auth record and my records is the absence of the 001 controlfield.
In my opinion, this make impossibile to link your auth record with biblio ones.
Here I compare one of my records with yours (I used a trick to avoid stripping out of xml tags using ≤...≥ delimiters).
My record was constructed by a tool I prepared and than imported with bulkmarcimport.pl in a Koha 323.
To explain the absence of the 001, maybe we need to know which Koha version you are working on.
Regards. Stefano

≤?xml version="1.0" encoding="UTF-8"?≥
≤record
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:schemaLocation="http://www.loc.gov/MARC21/slim http://www.loc.gov/standards/marcxml/schema/MARC21slim.xsd"
  xmlns="http://www.loc.gov/MARC21/slim"≥

 ≤leader≥00243nz  a2200121n  4500≤/leader≥
 ≤controlfield tag="001"≥65536≤/controlfield≥
 ≤controlfield tag="003"≥USC≤/controlfield≥
 ≤controlfield tag="005"≥20101109120000.0≤/controlfield≥
 ≤controlfield tag="008"≥101109|| aca||aabn           | a|a     d≤/controlfield≥
 ≤datafield tag="040" ind1=" " ind2=" "≥
  ≤subfield code="a"≥USC≤/subfield≥
 ≤/datafield≥
 ≤datafield tag="100" ind1="1" ind2=" "≥
  ≤subfield code="a"≥Sacchetta, Sergio.≤/subfield≥
 ≤/datafield≥
 ≤datafield tag="942" ind1=" " ind2=" "≥
  ≤subfield code="a"≥PERSO_NAME≤/subfield≥
 ≤/datafield≥
 ≤datafield tag="999" ind1=" " ind2=" "≥
  ≤subfield code="c"≥≤/subfield≥
  ≤subfield code="d"≥≤/subfield≥
 ≤/datafield≥
≤/record≥

≤record
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:schemaLocation="http://www.loc.gov/MARC21/slim http://www.loc.gov/standards/marcxml/schema/MARC21slim.xsd"
  xmlns="http://www.loc.gov/MARC21/slim"≥

 ≤leader≥00229nz##a2200097o##4500≤/leader≥
 ≤controlfield tag="003"≥OPIACS≤/controlfield≥
 ≤controlfield tag="005"≥20110301130115.0≤/controlfield≥
 ≤controlfield tag="008"≥110301000000|ge|dz||aaan|||||||||||||||c|||||d≤/controlfield≥
 ≤datafield tag="040" ind1=" " ind2=" "≥
  ≤subfield code="a"≥OPIACS≤/subfield≥
  ≤subfield code="b"≥eng≤/subfield≥
  ≤subfield code="c"≥OPIACS≤/subfield≥
 ≤/datafield≥
 ≤datafield tag="100" ind1=" " ind2=" "≥
  ≤subfield code="a"≥Zimmerman, Linda≤/subfield≥
 ≤/datafield≥
 ≤datafield tag="942" ind1=" " ind2=" "≥
  ≤subfield code="a"≥PERSO_NAME≤/subfield≥
 ≤/datafield≥
≤/record≥
On Mar 3, 2011, at 16:17 , Peter Huerter wrote:

> The XML in the last post is being automatically processed away, so here it is
> in a text file attachment:
>
> http://koha.1045719.n5.nabble.com/file/n3408083/mymarc.txt mymarc.txt
>
> Cheers,
> Pete.
>
> --
> View this message in context: http://koha.1045719.n5.nabble.com/bulk-BiblioAddsAuthorities-tp3378000p3408083.html
> Sent from the Koha - Discuss mailing list archive at Nabble.com.
> _______________________________________________
> Koha mailing list  http://koha-community.org
> [hidden email]
> http://lists.katipo.co.nz/mailman/listinfo/koha
>

_______________________________________________
Koha mailing list  http://koha-community.org
[hidden email]
http://lists.katipo.co.nz/mailman/listinfo/koha



_______________________________________________
Koha mailing list  http://koha-community.org
[hidden email]
http://lists.katipo.co.nz/mailman/listinfo/koha
Reply | Threaded
Open this post in threaded view
|

BiblioAddsAuthorities: SimpleSearch failing (previously "bulk BiblioAddsAuthorities?")

Peter Huerter
In reply to this post by Peter Huerter
Dear Koha developers,

I would like to bump my original post (previous title "bulk BiblioAddsAuthorities").  

addbiblio.pl appears to be called when one chooses "Edit Record" from the koha admin biblio details page.  This code appears to call BiblioAddAuthorities and I *really like* the result.  Authorities are generated for authors, subjects, and secondary authors.  Even better the string "machine generated" is used in the added authorities to mark them as being created automatically.  This is great!  I would like to do this in batch for all of our biblios since we manage our own authorities in house.  I would really like to make use of this feature since it works so well in the singleton case.  However I have run into problems writing a perl script to do this.  

I have a script working that iterates through a list of biblios calling BiblioAddAuthorities however I have one problem related to bulk processing, a temporal issue, or perhaps a caching issue(?).

BiblioAddAuthorities relies on SimpleSearch to search for existing authorities.  However when I am processing in batch SimpleSearch fails for *recently added authorities*.  I think SimpleSearch relies on Zebra.

I've tried adding dbh->commits, and re-indexing zebra before processing each biblio, but this does not seem to solve my problem.  When my script exits, SimpleSearch finds recently added authorities.  It is almost like there is a lazy write, or caching middle-layer getting in the way (but probably working very well for it's intended purpose of course).

It seems that recently added authorities are stuck in some sort of cache somewhere?  At the SQL level? At the Zebra layer?

How do I make sure that Zebra, and SQL are written to so that SimpleSearch has the latest up-to-date data to base it's search on?

Any pointers would be greatly appreciated.

Pete.
(Volunteer with the ACS - http://aandc.org/)
~ Volunteer ~
Archives and Collections Society - http://www.aandc.org/
"Maritime heritage and history, preservation and conservation, research and education through the written word and the arts."
Reply | Threaded
Open this post in threaded view
|

Re: BiblioAddsAuthorities: SimpleSearch failing (previously "bulk BiblioAddsAuthorities?")

Henri-Damien LAURENT
Le 07/03/2011 21:55, Peter Huerter a écrit :

> Dear Koha developers,
>
> I would like to bump my original post (previous title "bulk
> BiblioAddsAuthorities").  
>
> addbiblio.pl appears to be called when one chooses "Edit Record" from the
> koha admin biblio details page.  This code appears to call
> BiblioAddAuthorities and I *really like* the result.  Authorities are
> generated for authors, subjects, and secondary authors.  Even better the
> string "machine generated" is used in the added authorities to mark them as
> being created automatically.  This is great!  I would like to do this in
> batch for all of our biblios since we manage our own authorities in house.
> I would really like to make use of this feature since it works so well in
> the singleton case.  However I have run into problems writing a perl script
> to do this.  
>
> I have a script working that iterates through a list of biblios calling
> BiblioAddAuthorities however I have one problem related to bulk processing,
> a temporal issue, or perhaps a caching issue(?).
>
> BiblioAddAuthorities relies on SimpleSearch to search for existing
> authorities.  However when I am processing in batch SimpleSearch fails for
> *recently added authorities*.  I think SimpleSearch relies on Zebra.
>
> I've tried adding dbh->commits, and re-indexing zebra before processing each
> biblio, but this does not seem to solve my problem.  When my script exits,
> SimpleSearch finds recently added authorities.  It is almost like there is a
> lazy write, or caching middle-layer getting in the way (but probably working
> very well for it's intended purpose of course).
>
> It seems that recently added authorities are stuck in some sort of cache
> somewhere?  At the SQL level? At the Zebra layer?
>
> How do I make sure that Zebra, and SQL are written to so that SimpleSearch
> has the latest up-to-date data to base it's search on?
SimpleSearch is relying on zebra.
In order to make sure that it is indexed, you should index before your
bulk edition goes.
Hope that helps.
--
Henri-Damien LAURENT
_______________________________________________
Koha mailing list  http://koha-community.org
[hidden email]
http://lists.katipo.co.nz/mailman/listinfo/koha
Reply | Threaded
Open this post in threaded view
|

Re: BiblioAddsAuthorities: SimpleSearch failing (previously "bulk BiblioAddsAuthorities?")

Peter Huerter
Just following up ...

I finally got a script to work that loops through all biblios calling BiblioAddsAuthorities.

It is very slow however since it calls rebuild_zebra.pl every time a new authority is added (it took 10 hours to link ~12000 biblios).  But for our purposes it does what we want really well.  It extends some existing/fantastic Koha code to a batch job for a task (adding authorities) that is otherwise non-trivial (as far as I can tell).  We manage our own authorities in-house.

To make the script work there were 3 technical issues:

1) Call rebuild_zebra.pl every time a new authority is added.  This is required for SimpleSearch to be able to find a recently added authority.  I used perl fork/exec/waitpid for this.

2) Re-indexing zebra was not enough to make SimpleSearch "see" recently added authorities.  The connection to zebra had to be reset.  For this I used the set_context, and restore_context interface.  It appears that when you have an active Context, and even if you rebuild_zebra, that context encapsulates a stale Zebra index.  So you need to create a new Context in which to do your SimpleSearch, and then recycle that one (so you don't run out of system resources).

3) For a long running script it appears that set_context, and restore_context does not cleanup all open file handles, so I had to work around a system error "no files left" or something like that.  To do that I used a shameless hack.  I wrote a .csh script to call my perl script so that it processes only 100 biblios at a time.  That way when perl exits each time, the system resources (stale file handles, etc.) are recycled.  Of course it is possible that there is a problem with my script, but the hack worked around it and I was in a hurry.

I'd be happy to share the script with anyone who is interested.  I'm not sure what the policy is on refactoring Koha code and posting it.  It only works for MARC21 format - not sure what if anything would need to be done for UNIMARC.

Thanks for all of your help,

Pete.

btw. this is one of the first perl script I have ever written, so it is not slick if you know what I mean :)
~ Volunteer ~
Archives and Collections Society - http://www.aandc.org/
"Maritime heritage and history, preservation and conservation, research and education through the written word and the arts."
Reply | Threaded
Open this post in threaded view
|

Re: BiblioAddsAuthorities: SimpleSearch failing (previously "bulk BiblioAddsAuthorities?")

Peter Huerter
Here is the code.  

acsauthlink.pl

callacsauthlink

I hope this helps someone.  Comments welcome.

Pete.
~ Volunteer ~
Archives and Collections Society - http://www.aandc.org/
"Maritime heritage and history, preservation and conservation, research and education through the written word and the arts."