Zebra Sorting Problems

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Zebra Sorting Problems

Joshua Ferraro-3
On Tue, Mar 21, 2006 at 12:33:03AM -0500, Sebastian Hammer wrote:
> Downloading.....
Any luck Sebastian?

Now I'm trying to index using zebraidx on the cmd line and Tumer Garip's
zebra.cfg and record.abs files but I'm running into some errors. It
starts indexing fine but then croaks after about 30 seconds or so:

# zebraidx -g iso2709 -d kohademo update records -n
<snip>
19:16:57-21/03 zebraidx(29467) [log] add grs.marcxml.record records/npl.iso2709 894119
19:16:57-21/03 zebraidx(29467) [warn] Record didn't contain match fields in (bib1,Identifier-standard)
19:16:57-21/03 zebraidx(29467) [warn] Bad match criteria
19:16:57-21/03 zebraidx(29467) [log] zebra_end_trans
19:16:57-21/03 zebraidx(29467) [log] sorting section 1
19:16:57-21/03 zebraidx(29467) [log] Iterations . . .  94296
19:16:57-21/03 zebraidx(29467) [log] Distinct words .  28974
19:16:57-21/03 zebraidx(29467) [log] Updates. . . . .     18
19:16:57-21/03 zebraidx(29467) [log] Deletions. . . .      0
19:16:57-21/03 zebraidx(29467) [log] Insertions . . .  28956
19:16:57-21/03 zebraidx(29467) [log][app2] zebra_register_close p=0x80b40b0
19:16:58-21/03 zebraidx(29467) [log] Records:    1283 i/u/d 1283/0/0
19:16:58-21/03 zebraidx(29467) [log] user/system: 1006/28
19:16:58-21/03 zebraidx(29467) [log][app2] zebra_stop
19:16:58-21/03 zebraidx(29467) [log] zebraidx times: 10.95 10.06  0.28

I'm guessing it dies because it found a record that didn't have whatever
'Identifier-standard' is mapped to or something -- any ideas? Is there
a way to have it just 'skip' a malformed record and proceed?

In the meantime, my other server's about 35 hours into importing a
dataset of 150K records (it's almost half-way done) using perl-zoom.
Needless to say, that's not going to ever be an option for initial
import in the real world -- especially if the index crashes often. Even
mysql's faster :-).

I'm also getting some errors occasionally on that import:

no mapping found at position 11 in Mazur, Mont, g0=ASCII_DEFAULT g1=EXTENDED_LATIN at /usr/local/share/perl/5.8.4/MARC/Charset.pm line 134.
no mapping found at position 53 in Ashton Kutcher, Brittany Murphy, Christian Kane, Mont Mazur, Raymond J. Barry, George Gaynes. g0=ASCII_DEFAULT g1=EXTENDED_LA
TIN at /usr/local/share/perl/5.8.4/MARC/Charset.pm line 134.
no mapping found at position 11 in Mazur, Mont, g0=ASCII_DEFAULT g1=EXTENDED_LATIN at /usr/local/share/perl/5.8.4/MARC/Charset.pm line 134.
Oops!  ES: immediate execution failed
Oops!  ES: immediate execution failed
Oops!  ES: immediate execution failed
Oops!  ES: immediate execution failed
Oops!  ES: immediate execution failed
Oops!  ES: immediate execution failed
Oops!  ES: immediate execution failed
Oops!  ES: immediate execution failed

Now the Charset error must be related to data that's not yet been defined
in MARC::Charset, but what's the ES: error about?

Cheers,

--
Joshua Ferraro               VENDOR SERVICES FOR OPEN-SOURCE SOFTWARE
President, Technology       migration, training, maintenance, support
LibLime                                Featuring Koha Open-Source ILS
[hidden email] |Full Demos at http://liblime.com/koha |1(888)KohaILS


_______________________________________________
Koha-zebra mailing list
[hidden email]
http://lists.nongnu.org/mailman/listinfo/koha-zebra
Reply | Threaded
Open this post in threaded view
|

Re: Zebra Sorting Problems

Sebastian Hammer
Hi Joshua,

I've never seen an update run just terminate because of a missing match
key, but then I probably never have provided a record without one. It
does seem a bit draconian to just terminate.

If you are just aiming to demonstrate search/sort, just remove the
recordId line in zebra.cfg and start over!

But it'd probably be fun to know why you have a record withoyt an ID
number in there.

--Seb

Joshua Ferraro wrote:

>On Tue, Mar 21, 2006 at 12:33:03AM -0500, Sebastian Hammer wrote:
>  
>
>>Downloading.....
>>    
>>
>Any luck Sebastian?
>
>Now I'm trying to index using zebraidx on the cmd line and Tumer Garip's
>zebra.cfg and record.abs files but I'm running into some errors. It
>starts indexing fine but then croaks after about 30 seconds or so:
>
># zebraidx -g iso2709 -d kohademo update records -n
><snip>
>19:16:57-21/03 zebraidx(29467) [log] add grs.marcxml.record records/npl.iso2709 894119
>19:16:57-21/03 zebraidx(29467) [warn] Record didn't contain match fields in (bib1,Identifier-standard)
>19:16:57-21/03 zebraidx(29467) [warn] Bad match criteria
>19:16:57-21/03 zebraidx(29467) [log] zebra_end_trans
>19:16:57-21/03 zebraidx(29467) [log] sorting section 1
>19:16:57-21/03 zebraidx(29467) [log] Iterations . . .  94296
>19:16:57-21/03 zebraidx(29467) [log] Distinct words .  28974
>19:16:57-21/03 zebraidx(29467) [log] Updates. . . . .     18
>19:16:57-21/03 zebraidx(29467) [log] Deletions. . . .      0
>19:16:57-21/03 z
>
>ebraidx(29467) [log] Insertions . . .  28956
>19:16:57-21/03 zebraidx(29467) [log][app2] zebra_register_close p=0x80b40b0
>19:16:58-21/03 zebraidx(29467) [log] Records:    1283 i/u/d 1283/0/0
>19:16:58-21/03 zebraidx(29467) [log] user/system: 1006/28
>19:16:58-21/03 zebraidx(29467) [log][app2] zebra_stop
>19:16:58-21/03 zebraidx(29467) [log] zebraidx times: 10.95 10.06  0.28
>
>I'm guessing it dies because it found a record that didn't have whatever
>'Identifier-standard' is mapped to or something -- any ideas? Is there
>a way to have it just 'skip' a malformed record and proceed?
>
>In the meantime, my other server's about 35 hours into importing a
>dataset of 150K records (it's almost half-way done) using perl-zoom.
>Needless to say, that's not going to ever be an option for initial
>import in the real world -- especially if the index crashes often. Even
>mysql's faster :-).
>
>I'm also getting some errors occasionally on that import:
>
>no mapping found at position 11 in Mazur, Mont, g0=ASCII_DEFAULT g1=EXTENDED_LATIN at /usr/local/share/perl/5.8.4/MARC/Charset.pm line 134.
>no mapping found at position 53 in Ashton Kutcher, Brittany Murphy, Christian Kane, Mont Mazur, Raymond J. Barry, George Gaynes. g0=ASCII_DEFAULT g1=EXTENDED_LA
>TIN at /usr/local/share/perl/5.8.4/MARC/Charset.pm line 134.
>no mapping found at position 11 in Mazur, Mont, g0=ASCII_DEFAULT g1=EXTENDED_LATIN at /usr/local/share/perl/5.8.4/MARC/Charset.pm line 134.
>Oops!  ES: immediate execution failed
>Oops!  ES: immediate execution failed
>Oops!  ES: immediate execution failed
>Oops!  ES: immediate execution failed
>Oops!  ES: immediate execution failed
>Oops!  ES: immediate execution failed
>Oops!  ES: immediate execution failed
>Oops!  ES: immediate execution failed
>
>Now the Charset error must be related to data that's not yet been defined
>in MARC::Charset, but what's the ES: error about?
>
>Cheers,
>
>  
>

--
Sebastian Hammer, Index Data
[hidden email]   www.indexdata.com
Ph: (603) 209-6853



_______________________________________________
Koha-zebra mailing list
[hidden email]
http://lists.nongnu.org/mailman/listinfo/koha-zebra
Reply | Threaded
Open this post in threaded view
|

Re: Zebra Sorting Problems

Sebastian Hammer
In reply to this post by Joshua Ferraro-3
Joshua Ferraro wrote:

>On Tue, Mar 21, 2006 at 12:33:03AM -0500, Sebastian Hammer wrote:
>  
>
>>Downloading.....
>>    
>>
>Any luck Sebastian?
>
>Now I'm trying to index using zebraidx on the cmd line and Tumer Garip's
>zebra.cfg and record.abs files but I'm running into some errors. It
>starts indexing fine but then croaks after about 30 seconds or so:
>  
>
If it's any help, the record right after the record with 001 70831 in
your input file has no 001 field... in fact, it seems to consist of only
a single field. Here's the yaz-marcdump -v of the record:

> <!--
> Record length            68
> Indicator length          2
> Identifier length         2
> Base address             37
> Length data entry         4
> Length starting           5
> Length implementation     0
> -->
> <!-- Directory offset 24: Tag 952 -->
> <!-- Directory offset 24: data-length 30, data-offset 0 -->
> <!-- identifier_flag = 1 -->
> 952    $b NPL $p 31000000018769 $u 2014


--Sebastian

># zebraidx -g iso2709 -d kohademo update records -n
><snip>
>19:16:57-21/03 zebraidx(29467) [log] add grs.marcxml.record records/npl.iso2709 894119
>19:16:57-21/03 zebraidx(29467) [warn] Record didn't contain match fields in (bib1,Identifier-standard)
>19:16:57-21/03 zebraidx(29467) [warn] Bad match criteria
>19:16:57-21/03 zebraidx(29467) [log] zebra_end_trans
>19:16:57-21/03 zebraidx(29467) [log] sorting section 1
>19:16:57-21/03 zebraidx(29467) [log] Iterations . . .  94296
>19:16:57-21/03 zebraidx(29467) [log] Distinct words .  28974
>19:16:57-21/03 zebraidx(29467) [log] Updates. . . . .     18
>19:16:57-21/03 zebraidx(29467) [log] Deletions. . . .      0
>19:16:57-21/03 zebraidx(29467) [log] Insertions . . .  28956
>19:16:57-21/03 zebraidx(29467) [log][app2] zebra_register_close p=0x80b40b0
>19:16:58-21/03 zebraidx(29467) [log] Records:    1283 i/u/d 1283/0/0
>19:16:58-21/03 zebraidx(29467) [log] user/system: 1006/28
>19:16:58-21/03 zebraidx(29467) [log][app2] zebra_stop
>19:16:58-21/03 zebraidx(29467) [log] zebraidx times: 10.95 10.06  0.28
>
>I'm guessing it dies because it found a record that didn't have whatever
>'Identifier-standard' is mapped to or something -- any ideas? Is there
>a way to have it just 'skip' a malformed record and proceed?
>
>In the meantime, my other server's about 35 hours into importing a
>dataset of 150K records (it's almost half-way done) using perl-zoom.
>Needless to say, that's not going to ever be an option for initial
>import in the real world -- especially if the index crashes often. Even
>mysql's faster :-).
>
>I'm also getting some errors occasionally on that import:
>
>no mapping found at position 11 in Mazur, Mont, g0=ASCII_DEFAULT g1=EXTENDED_LATIN at /usr/local/share/perl/5.8.4/MARC/Charset.pm line 134.
>no mapping found at position 53 in Ashton Kutcher, Brittany Murphy, Christian Kane, Mont Mazur, Raymond J. Barry, George Gaynes. g0=ASCII_DEFAULT g1=EXTENDED_LA
>TIN at /usr/local/share/perl/5.8.4/MARC/Charset.pm line 134.
>no mapping found at position 11 in Mazur, Mont, g0=ASCII_DEFAULT g1=EXTENDED_LATIN at /usr/local/share/perl/5.8.4/MARC/Charset.pm line 134.
>Oops!  ES: immediate execution failed
>Oops!  ES: immediate execution failed
>Oops!  ES: immediate execution failed
>Oops!  ES: immediate execution failed
>Oops!  ES: immediate execution failed
>Oops!  ES: immediate execution failed
>Oops!  ES: immediate execution failed
>Oops!  ES: immediate execution failed
>
>Now the Charset error must be related to data that's not yet been defined
>in MARC::Charset, but what's the ES: error about?
>
>Cheers,
>
>  
>

--
Sebastian Hammer, Index Data
[hidden email]   www.indexdata.com
Ph: (603) 209-6853



_______________________________________________
Koha-zebra mailing list
[hidden email]
http://lists.nongnu.org/mailman/listinfo/koha-zebra