Batch cleanup to the catalog

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Batch cleanup to the catalog

Cab Vinton
Hi, All --

Koha's Batch Record Modification (BRM) tool makes it very easy to make
large-scale changes to the catalog. And we know that many catalogers
like to have their records just so :-)

Is there a cost in overhead, however, to making such changes?

For example, if a catalog of 150,000 records contains 75,000 unwanted
MARC fields, would you delete them, even if they're not displayed in
the OPAC or otherwise interfering w/ functionality?

In particular, I'm wondering whether using the BRM tool involves the
creation of new data so that there's no real net gain w/ respect to
the goal of having a "cleaner" database. (In our case, we've turned
off the CataloguingLog system preference.)

Along the same lines, are there any advantages to doing large-scale
batch process via the backend instead, i.e., as opposed to w/ the
built-in staff tools such as BRM? (I'm assuming there's no issue w/
using such tools to work w/ much smaller subsets of records.)

Thanks in advance for any guidance.

All best,

Cab Vinton, Director
Plaistow Public Library
Plaistow, NH
_______________________________________________
Koha-devel mailing list
[hidden email]
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-devel
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/
Reply | Threaded
Open this post in threaded view
|

Re: Batch cleanup to the catalog

Michael Hafen-4
Since you have the CataloguingLog turned off, you should see a net gain in the size of the database, both in Koha and in Zebra/Solr.  Though you probably won't notice the difference in Koha since the tables are InnoDB, which doesn't reduce file size in most cases.
In the frontend you probably won't notice much difference I would expect.  Since you indicate the fields are hidden in the OPAC the biggest difference will be in the work of your cataloguing staff.  I don't know if you have discusses the idea with them, sometimes cataloguers prefer to have more information available (even if unused), sometimes they prefer less.  More information could mean better record matching in searches and record merges.  Less information could mean more efficient cataloguing since the staff doesn't need to keep as much information at hand while cataloguing.  I tend to prefer keeping the information, and hide it in the staff interface if it bothers someone, but that's my preference.
As far as drawbacks to doing such changes in the backend there is one big drawback.  The metadata / marc fields need to be rebuilt and the search engine needs to be reindexed after any changes by hand to the database.  That is the reason I tend to do any big batch modifications by setting up a script that uses the Koha modules ( the C4 / Koha api ).  That way Koha itself will take care of that for me.
One final note, the difference in size from removing those fields is likely to be small, in the order of a few hundred megabytes at most would be my guess.

On Wed, Aug 1, 2018 at 7:00 AM Cab Vinton <[hidden email]> wrote:
Hi, All --

Koha's Batch Record Modification (BRM) tool makes it very easy to make
large-scale changes to the catalog. And we know that many catalogers
like to have their records just so :-)

Is there a cost in overhead, however, to making such changes?

For example, if a catalog of 150,000 records contains 75,000 unwanted
MARC fields, would you delete them, even if they're not displayed in
the OPAC or otherwise interfering w/ functionality?

In particular, I'm wondering whether using the BRM tool involves the
creation of new data so that there's no real net gain w/ respect to
the goal of having a "cleaner" database. (In our case, we've turned
off the CataloguingLog system preference.)

Along the same lines, are there any advantages to doing large-scale
batch process via the backend instead, i.e., as opposed to w/ the
built-in staff tools such as BRM? (I'm assuming there's no issue w/
using such tools to work w/ much smaller subsets of records.)

Thanks in advance for any guidance.

All best,

Cab Vinton, Director
Plaistow Public Library
Plaistow, NH
_______________________________________________
Koha-devel mailing list
[hidden email]
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-devel
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/


--
Michael Hafen
Washington County School District Technology Department
Systems Analyst


_______________________________________________
Koha-devel mailing list
[hidden email]
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-devel
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/
Reply | Threaded
Open this post in threaded view
|

Re: Batch cleanup to the catalog

Cab Vinton
Many thanks, Michael.

Any changes would definitely be a cataloger-driven process.

Fwiw, we're looking mainly at extraneous OCLC-related fields in the
9XX's. But there are also 2 to 3 dozen additional fields that are
obsolete or otherwise not part of standard MARC 21 format.

I'd be surprised if many of these fields were indexed to be honest,
but I haven't taken a close look to confirm. (Catalog currently uses
244 fields, many of these only a handful of times.)

This is a medium term project, so I expect it will take the catalogers
a while to get through the planning phase, & also that the cleanup
itself will be done in stages. Ultimate goals are to get rid of
unnecessary fields, and to make sure they're not imported into the
catalog in the first place.

Thanks again,

Cab Vinton
Plaistow Public Library


On Wed, Aug 1, 2018 at 11:28 AM, Michael Hafen
<[hidden email]> wrote:

> Since you have the CataloguingLog turned off, you should see a net gain in
> the size of the database, both in Koha and in Zebra/Solr.  Though you
> probably won't notice the difference in Koha since the tables are InnoDB,
> which doesn't reduce file size in most cases.
> In the frontend you probably won't notice much difference I would expect.
> Since you indicate the fields are hidden in the OPAC the biggest difference
> will be in the work of your cataloguing staff.  I don't know if you have
> discusses the idea with them, sometimes cataloguers prefer to have more
> information available (even if unused), sometimes they prefer less.  More
> information could mean better record matching in searches and record merges.
> Less information could mean more efficient cataloguing since the staff
> doesn't need to keep as much information at hand while cataloguing.  I tend
> to prefer keeping the information, and hide it in the staff interface if it
> bothers someone, but that's my preference.
> As far as drawbacks to doing such changes in the backend there is one big
> drawback.  The metadata / marc fields need to be rebuilt and the search
> engine needs to be reindexed after any changes by hand to the database.
> That is the reason I tend to do any big batch modifications by setting up a
> script that uses the Koha modules ( the C4 / Koha api ).  That way Koha
> itself will take care of that for me.
> One final note, the difference in size from removing those fields is likely
> to be small, in the order of a few hundred megabytes at most would be my
> guess.
>
> On Wed, Aug 1, 2018 at 7:00 AM Cab Vinton <[hidden email]> wrote:
>>
>> Hi, All --
>>
>> Koha's Batch Record Modification (BRM) tool makes it very easy to make
>> large-scale changes to the catalog. And we know that many catalogers
>> like to have their records just so :-)
>>
>> Is there a cost in overhead, however, to making such changes?
>>
>> For example, if a catalog of 150,000 records contains 75,000 unwanted
>> MARC fields, would you delete them, even if they're not displayed in
>> the OPAC or otherwise interfering w/ functionality?
>>
>> In particular, I'm wondering whether using the BRM tool involves the
>> creation of new data so that there's no real net gain w/ respect to
>> the goal of having a "cleaner" database. (In our case, we've turned
>> off the CataloguingLog system preference.)
>>
>> Along the same lines, are there any advantages to doing large-scale
>> batch process via the backend instead, i.e., as opposed to w/ the
>> built-in staff tools such as BRM? (I'm assuming there's no issue w/
>> using such tools to work w/ much smaller subsets of records.)
>>
>> Thanks in advance for any guidance.
>>
>> All best,
>>
>> Cab Vinton, Director
>> Plaistow Public Library
>> Plaistow, NH
>> _______________________________________________
>> Koha-devel mailing list
>> [hidden email]
>> http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-devel
>> website : http://www.koha-community.org/
>> git : http://git.koha-community.org/
>> bugs : http://bugs.koha-community.org/
>
>
>
> --
> Michael Hafen
> Washington County School District Technology Department
> Systems Analyst
>
_______________________________________________
Koha-devel mailing list
[hidden email]
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-devel
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/