Task schedulers and message queues for Koha

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Task schedulers and message queues for Koha

David Cook

Hi all,

 

In 2016, I worked on a Koha task scheduler for downloading and importing records via OAI-PMH. I have code which works, but it’s lacking test coverage and I’m unsure that it will make it through QA and be accepted.

 

I recall Chris Cormack suggesting I look at Gearman (http://gearman.org/) instead, and it looks pretty good at a glance, although Andreas and I had talked about having more control over the workers than Gearman seems to offer. Plus, it added another dependency to Koha where people already struggle with dependencies.

 

Martin suggested that there were a lot of other implementations out there, but I haven’t really found much else that provides everything we want out of the box. The one I have found is Celery (http://www.celeryproject.org/), which looks like exactly what I want I think, but it requires Python for its server and workers, and it requires a message broker like RabbitMQ or Redis. Lots of extra dependencies.

 

Since my goal is to get our code into Koha, I really want to know what will work for people. Are people happy with a home grown solution? It’s not really that complicated.

 

My current version is essentially a task scheduler which forks a worker on demand when it’s time to run a task (up to a configurable max of X tasks so you don’t kill your server). It lets you submit tasks, tell the scheduler to start/schedule them, and you can even tell in progress tasks to stop (by having the scheduler tell the worker to stop and the worker decides where in its task it checks for stop commands from the scheduler).

 

As I try to add test coverage and make this scheduler more palatable, I find myself thinking about the code more like Koha::Scheduler and Koha::Queue, and using the more scalable worker model used by others. The scheduler daemon would listen on a socket for tasks, it would create a Koha::Scheduler instance which would enqueue tasks to run once that task’s time was met or exceeded. Now depending on the architecture… you could have a separate daemon or the same daemon with a Koha::Queue instance. It would accept tasks/messages from the scheduler, and it would dequeue tasks to available workers - which are separate processes - have previously registered against particular queues. In this way, you can have a oaipmh-download queue, oaipmh-import queue, a email-report queue, etc.

 

I suspect that we could make use of Koha::Scheduler and Koha::Queue throughout much of Koha for doing background tasks. If we don’t want to reinvent the wheel with Koha::Queue, we could use something like RabbitMQ. But I think we need *something*.

 

I’m open to ideas. I already have the OAI-PMH download and OAI-PMH import handled. That’s the easy part. Any worker can do that. The hard part is figuring out how the Koha Community will take up a task scheduler.

 

David Cook

Systems Librarian

Prosentient Systems

72/330 Wattle St

Ultimo, NSW 2007

Australia

 

Office: 02 9212 0899

Direct: 02 8005 0595

 


_______________________________________________
Koha-devel mailing list
[hidden email]
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-devel
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Task schedulers and message queues for Koha

Tajoli Zeno
Hi David,

Il 20/02/2017 02:01, David Cook ha scritto:
> In 2016, I worked on a Koha task scheduler for downloading and importing
> records via OAI-PMH. I have code which works, but it’s lacking test
> coverage and I’m unsure that it will make it through QA and be accepted.

for me is not 100% clear why do you use a Task scheduler.
Why do you no manage queues only with cron ?
At least we need a web interface to setup cron.

Bye
Zeno Tajoli


--
Zeno Tajoli
/SVILUPPO PRODOTTI CINECA/ - Automazione Biblioteche
Email: [hidden email] Fax: 051/6132198
*CINECA* Consorzio Interuniversitario - Sede operativa di Segrate (MI)
_______________________________________________
Koha-devel mailing list
[hidden email]
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-devel
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Task schedulers and message queues for Koha

David Cook
Hi Zeno,

Thanks for your message.

Originally, I developed the OAI-PMH harvester using a cronjob, but that
wasn't acceptable for Stockholm University Library for a few reasons.

One, there was no web interface for controlling it. Two, they wanted to
execute OAI-PMH requests every 2-3 seconds and cron has 1 minute as its
finest granularity. Three, even if you setup a cronjob to run every minute,
long running tasks could get duplicated (although you could mitigate that
with locks which would be a pain). Plus, you want to run tasks in parallel,
so you're going to want to use multiple processes, which cron isn't really
set up to achieve.

At the end of it, I think it's more 'to purpose' to have your own daemon
where you can control time intervals, workers, have a web interface, etc.

Cron is useful for many purposes, but I don't think it's always the right
solution. Plus, creating a web interface for cron isn't necessarily the best
idea I think. I would consider it to be a hack rather than a solution. As
others have pointed out before me, it comes with significant security
issues. With a task scheduler, you're much more in control of everything.  

David Cook
Systems Librarian
Prosentient Systems
72/330 Wattle St
Ultimo, NSW 2007
Australia

Office: 02 9212 0899
Direct: 02 8005 0595


> -----Original Message-----
> From: Tajoli Zeno [mailto:[hidden email]]
> Sent: Wednesday, 22 February 2017 12:28 AM
> To: David Cook <[hidden email]>; [hidden email]-
> community.org
> Subject: Re: [Koha-devel] Task schedulers and message queues for Koha
>
> Hi David,
>
> Il 20/02/2017 02:01, David Cook ha scritto:
> > In 2016, I worked on a Koha task scheduler for downloading and
> > importing records via OAI-PMH. I have code which works, but it's
> > lacking test coverage and I'm unsure that it will make it through QA and
be

> accepted.
>
> for me is not 100% clear why do you use a Task scheduler.
> Why do you no manage queues only with cron ?
> At least we need a web interface to setup cron.
>
> Bye
> Zeno Tajoli
>
>
> --
> Zeno Tajoli
> /SVILUPPO PRODOTTI CINECA/ - Automazione Biblioteche
> Email: [hidden email] Fax: 051/6132198
> *CINECA* Consorzio Interuniversitario - Sede operativa di Segrate (MI)


_______________________________________________
Koha-devel mailing list
[hidden email]
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-devel
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Task schedulers and message queues for Koha

Tajoli Zeno
Hi David and all,

Il 21/02/2017 23:29, David Cook ha scritto:
>. Two, they wanted to
> execute OAI-PMH requests every 2-3 seconds and cron has 1 minute as its
> finest granularity. Three, even if you setup a cronjob to run every minute,
> long running tasks could get duplicated (although you could mitigate that
> with locks which would be a pain). Plus, you want to run tasks in parallel,
> so you're going to want to use multiple processes, which cron isn't really
> set up to achieve.

Ok, if you need those features cron isn't enough.
But why do you drop the option Celery +  RabbitMQ + AnyEvent::RabbitMQ

They have official debiano packages:
https://packages.debian.org/jessie/python-celery
https://packages.debian.org/jessie/rabbitmq-server
https://packages.debian.org/jessie/libanyevent-rabbitmq-perl

We still use one of their dpendencies for similar tasks
(libanyevent-perl "event loop framework with multiple implementations").

Python is already present in our Debian/Ubuntu system, it is a prereq of
the distributions.

Redone a so complex stack in perl i think is very complex.

Bye
Zeno Tajoli



--
Zeno Tajoli
/SVILUPPO PRODOTTI CINECA/ - Automazione Biblioteche
Email: [hidden email] Fax: 051/6132198
*CINECA* Consorzio Interuniversitario - Sede operativa di Segrate (MI)
_______________________________________________
Koha-devel mailing list
[hidden email]
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-devel
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Task schedulers and message queues for Koha

David Cook
Hi Zeno,

I have a number of concerns about Celery. One of those is that it would add
numerous external dependencies and complexity to Koha implementations.

Your suggestion of Celery + RabbitMQ + AnyEvent::RabbitMQ sounds ok,
although it would involve work too. While Celery clients exist for PHP and
Node.js, we'd need to create a Perl implementation of the Celery protocol
using AnyEvent::RabbitMQ (or Net::RabbitFoot). Not that I'm necessarily
opposed to that.

We'd also still need to write the tasks in Python (or use web hooks which
would have the overhead of HTTP plus you'd have to worry about your web
server being up). I'm not sure how keen the community at large is to support
more server-side languages. I like writing Python, so I don't mind porting
over my OAI-PMH code from Perl to Python. I've abandoned the HTTP::OAI
module anyway for a few reasons.

RabbitMQ is a pretty heavy duty product as well which comes with its own
requirements: https://www.rabbitmq.com/production-checklist.html. While we
currently help people with Apache, MySQL, Zebra, and ElasticSearch, we'd
also all need to become experts with RabbitMQ.

I've already put together a Perl-based scheduler using POE which forks its
own workers. And I've already put together a basic Perl-based message queue
which sends events to pre-existing workers (like Celery). Celery with
RabbitMQ is more mature and complex, but my Perl programs do the trick.

Looking at DSpace's OAI-PMH harvester, it works very much like my first
design. It's a Java scheduler which uses threads rather than child processes
to do its work.

Due to the lack of engagement overall, I think I'll probably just keep my
existing design, since it works and works quite well.

David Cook
Systems Librarian
Prosentient Systems
72/330 Wattle St
Ultimo, NSW 2007
Australia

Office: 02 9212 0899
Direct: 02 8005 0595


> -----Original Message-----
> From: Tajoli Zeno [mailto:[hidden email]]
> Sent: Wednesday, 22 February 2017 7:49 PM
> To: David Cook <[hidden email]>; [hidden email]-
> community.org
> Subject: Re: [Koha-devel] Task schedulers and message queues for Koha
>
> Hi David and all,
>
> Il 21/02/2017 23:29, David Cook ha scritto:
> >. Two, they wanted to
> > execute OAI-PMH requests every 2-3 seconds and cron has 1 minute as
> >its  finest granularity. Three, even if you setup a cronjob to run
> >every minute,  long running tasks could get duplicated (although you
> >could mitigate that  with locks which would be a pain). Plus, you want
> >to run tasks in parallel,  so you're going to want to use multiple
> >processes, which cron isn't really  set up to achieve.
>
> Ok, if you need those features cron isn't enough.
> But why do you drop the option Celery +  RabbitMQ + AnyEvent::RabbitMQ
>
> They have official debiano packages:
> https://packages.debian.org/jessie/python-celery
> https://packages.debian.org/jessie/rabbitmq-server
> https://packages.debian.org/jessie/libanyevent-rabbitmq-perl
>
> We still use one of their dpendencies for similar tasks (libanyevent-perl
"event
> loop framework with multiple implementations").
>
> Python is already present in our Debian/Ubuntu system, it is a prereq of
the

> distributions.
>
> Redone a so complex stack in perl i think is very complex.
>
> Bye
> Zeno Tajoli
>
>
>
> --
> Zeno Tajoli
> /SVILUPPO PRODOTTI CINECA/ - Automazione Biblioteche
> Email: [hidden email] Fax: 051/6132198
> *CINECA* Consorzio Interuniversitario - Sede operativa di Segrate (MI)


_______________________________________________
Koha-devel mailing list
[hidden email]
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-devel
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Task schedulers and message queues for Koha

Tomas Cohen Arazi

Share it :-)


El mié., 22 de feb. de 2017 9:57 PM, David Cook <[hidden email]> escribió:
Hi Zeno,

I have a number of concerns about Celery. One of those is that it would add
numerous external dependencies and complexity to Koha implementations.

Your suggestion of Celery + RabbitMQ + AnyEvent::RabbitMQ sounds ok,
although it would involve work too. While Celery clients exist for PHP and
Node.js, we'd need to create a Perl implementation of the Celery protocol
using AnyEvent::RabbitMQ (or Net::RabbitFoot). Not that I'm necessarily
opposed to that.

We'd also still need to write the tasks in Python (or use web hooks which
would have the overhead of HTTP plus you'd have to worry about your web
server being up). I'm not sure how keen the community at large is to support
more server-side languages. I like writing Python, so I don't mind porting
over my OAI-PMH code from Perl to Python. I've abandoned the HTTP::OAI
module anyway for a few reasons.

RabbitMQ is a pretty heavy duty product as well which comes with its own
requirements: https://www.rabbitmq.com/production-checklist.html. While we
currently help people with Apache, MySQL, Zebra, and ElasticSearch, we'd
also all need to become experts with RabbitMQ.

I've already put together a Perl-based scheduler using POE which forks its
own workers. And I've already put together a basic Perl-based message queue
which sends events to pre-existing workers (like Celery). Celery with
RabbitMQ is more mature and complex, but my Perl programs do the trick.

Looking at DSpace's OAI-PMH harvester, it works very much like my first
design. It's a Java scheduler which uses threads rather than child processes
to do its work.

Due to the lack of engagement overall, I think I'll probably just keep my
existing design, since it works and works quite well.

David Cook
Systems Librarian
Prosentient Systems
72/330 Wattle St
Ultimo, NSW 2007
Australia

Office: 02 9212 0899
Direct: 02 8005 0595


> -----Original Message-----
> From: Tajoli Zeno [mailto:[hidden email]]
> Sent: Wednesday, 22 February 2017 7:49 PM
> To: David Cook <[hidden email]>; [hidden email]-
> community.org
> Subject: Re: [Koha-devel] Task schedulers and message queues for Koha
>
> Hi David and all,
>
> Il 21/02/2017 23:29, David Cook ha scritto:
> >. Two, they wanted to
> > execute OAI-PMH requests every 2-3 seconds and cron has 1 minute as
> >its  finest granularity. Three, even if you setup a cronjob to run
> >every minute,  long running tasks could get duplicated (although you
> >could mitigate that  with locks which would be a pain). Plus, you want
> >to run tasks in parallel,  so you're going to want to use multiple
> >processes, which cron isn't really  set up to achieve.
>
> Ok, if you need those features cron isn't enough.
> But why do you drop the option Celery +  RabbitMQ + AnyEvent::RabbitMQ
>
> They have official debiano packages:
> https://packages.debian.org/jessie/python-celery
> https://packages.debian.org/jessie/rabbitmq-server
> https://packages.debian.org/jessie/libanyevent-rabbitmq-perl
>
> We still use one of their dpendencies for similar tasks (libanyevent-perl
"event
> loop framework with multiple implementations").
>
> Python is already present in our Debian/Ubuntu system, it is a prereq of
the
> distributions.
>
> Redone a so complex stack in perl i think is very complex.
>
> Bye
> Zeno Tajoli
>
>
>
> --
> Zeno Tajoli
> /SVILUPPO PRODOTTI CINECA/ - Automazione Biblioteche
> Email: [hidden email] Fax: 051/6132198
> *CINECA* Consorzio Interuniversitario - Sede operativa di Segrate (MI)


_______________________________________________
Koha-devel mailing list
[hidden email]
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-devel
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/
--
Tomás Cohen Arazi
Theke Solutions (https://theke.io)
✆ +54 9351 3513384
GPG: B2F3C15F

_______________________________________________
Koha-devel mailing list
[hidden email]
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-devel
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Task schedulers and message queues for Koha

David Cook

Which one, Tomas?

 

I’m planning to post the code for what I have already in early March.

 

David Cook

Systems Librarian

Prosentient Systems

72/330 Wattle St

Ultimo, NSW 2007

Australia

 

Office: 02 9212 0899

Direct: 02 8005 0595

 

From: Tomas Cohen Arazi [mailto:[hidden email]]
Sent: Thursday, 23 February 2017 2:16 PM
To: David Cook <[hidden email]>; Tajoli Zeno <[hidden email]>; [hidden email]
Subject: Re: [Koha-devel] Task schedulers and message queues for Koha

 

Share it :-)

 

El mié., 22 de feb. de 2017 9:57 PM, David Cook <[hidden email]> escribió:

Hi Zeno,

I have a number of concerns about Celery. One of those is that it would add
numerous external dependencies and complexity to Koha implementations.

Your suggestion of Celery + RabbitMQ + AnyEvent::RabbitMQ sounds ok,
although it would involve work too. While Celery clients exist for PHP and
Node.js, we'd need to create a Perl implementation of the Celery protocol
using AnyEvent::RabbitMQ (or Net::RabbitFoot). Not that I'm necessarily
opposed to that.

We'd also still need to write the tasks in Python (or use web hooks which
would have the overhead of HTTP plus you'd have to worry about your web
server being up). I'm not sure how keen the community at large is to support
more server-side languages. I like writing Python, so I don't mind porting
over my OAI-PMH code from Perl to Python. I've abandoned the HTTP::OAI
module anyway for a few reasons.

RabbitMQ is a pretty heavy duty product as well which comes with its own
requirements: https://www.rabbitmq.com/production-checklist.html. While we
currently help people with Apache, MySQL, Zebra, and ElasticSearch, we'd
also all need to become experts with RabbitMQ.

I've already put together a Perl-based scheduler using POE which forks its
own workers. And I've already put together a basic Perl-based message queue
which sends events to pre-existing workers (like Celery). Celery with
RabbitMQ is more mature and complex, but my Perl programs do the trick.

Looking at DSpace's OAI-PMH harvester, it works very much like my first
design. It's a Java scheduler which uses threads rather than child processes
to do its work.

Due to the lack of engagement overall, I think I'll probably just keep my
existing design, since it works and works quite well.

David Cook
Systems Librarian
Prosentient Systems
72/330 Wattle St
Ultimo, NSW 2007
Australia

Office: 02 9212 0899
Direct: 02 8005 0595


> -----Original Message-----
> From: Tajoli Zeno [mailto:[hidden email]]
> Sent: Wednesday, 22 February 2017 7:49 PM
> To: David Cook <[hidden email]>; [hidden email]
> community.org
> Subject: Re: [Koha-devel] Task schedulers and message queues for Koha
>
> Hi David and all,
>
> Il 21/02/2017 23:29, David Cook ha scritto:
> >. Two, they wanted to
> > execute OAI-PMH requests every 2-3 seconds and cron has 1 minute as
> >its  finest granularity. Three, even if you setup a cronjob to run
> >every minute,  long running tasks could get duplicated (although you
> >could mitigate that  with locks which would be a pain). Plus, you want
> >to run tasks in parallel,  so you're going to want to use multiple
> >processes, which cron isn't really  set up to achieve.
>
> Ok, if you need those features cron isn't enough.
> But why do you drop the option Celery +  RabbitMQ + AnyEvent::RabbitMQ
>
> They have official debiano packages:
> https://packages.debian.org/jessie/python-celery
> https://packages.debian.org/jessie/rabbitmq-server
> https://packages.debian.org/jessie/libanyevent-rabbitmq-perl
>
> We still use one of their dpendencies for similar tasks (libanyevent-perl
"event
> loop framework with multiple implementations").
>
> Python is already present in our Debian/Ubuntu system, it is a prereq of
the

> distributions.
>
> Redone a so complex stack in perl i think is very complex.
>
> Bye
> Zeno Tajoli
>
>
>
> --
> Zeno Tajoli
> /SVILUPPO PRODOTTI CINECA/ - Automazione Biblioteche
> Email: [hidden email] Fax: 051/6132198
> *CINECA* Consorzio Interuniversitario - Sede operativa di Segrate (MI)


_______________________________________________
Koha-devel mailing list
[hidden email]
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-devel
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/

--

Tomás Cohen Arazi

Theke Solutions (https://theke.io)
+54 9351 3513384
GPG: B2F3C15F


_______________________________________________
Koha-devel mailing list
[hidden email]
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-devel
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Task schedulers and message queues for Koha

Jonathan Druart
On Thu, 23 Feb 2017 at 00:51 David Cook <[hidden email]> wrote:

I’m planning to post the code for what I have already in early March.


Any news here?
We really need to remove the way our background jobs are implemented to make them work under Plack.
I'd like to avoid duplication of work...

 

 David Cook

Systems Librarian

Prosentient Systems

72/330 Wattle St

Ultimo, NSW 2007

Australia

 

Office: <a href="tel:02%2092%2012%2008%2099" value="+33292120899" target="_blank">02 9212 0899

Direct: <a href="tel:02%2080%2005%2005%2095" value="+33280050595" target="_blank">02 8005 0595

 

From: Tomas Cohen Arazi [mailto:[hidden email]]
Sent: Thursday, 23 February 2017 2:16 PM
To: David Cook <[hidden email]>; Tajoli Zeno <[hidden email]>; [hidden email]


Subject: Re: [Koha-devel] Task schedulers and message queues for Koha

 

Share it :-)

 

El mié., 22 de feb. de 2017 9:57 PM, David Cook <[hidden email]> escribió:

Hi Zeno,

I have a number of concerns about Celery. One of those is that it would add
numerous external dependencies and complexity to Koha implementations.

Your suggestion of Celery + RabbitMQ + AnyEvent::RabbitMQ sounds ok,
although it would involve work too. While Celery clients exist for PHP and
Node.js, we'd need to create a Perl implementation of the Celery protocol
using AnyEvent::RabbitMQ (or Net::RabbitFoot). Not that I'm necessarily
opposed to that.

We'd also still need to write the tasks in Python (or use web hooks which
would have the overhead of HTTP plus you'd have to worry about your web
server being up). I'm not sure how keen the community at large is to support
more server-side languages. I like writing Python, so I don't mind porting
over my OAI-PMH code from Perl to Python. I've abandoned the HTTP::OAI
module anyway for a few reasons.

RabbitMQ is a pretty heavy duty product as well which comes with its own
requirements: https://www.rabbitmq.com/production-checklist.html. While we
currently help people with Apache, MySQL, Zebra, and ElasticSearch, we'd
also all need to become experts with RabbitMQ.

I've already put together a Perl-based scheduler using POE which forks its
own workers. And I've already put together a basic Perl-based message queue
which sends events to pre-existing workers (like Celery). Celery with
RabbitMQ is more mature and complex, but my Perl programs do the trick.

Looking at DSpace's OAI-PMH harvester, it works very much like my first
design. It's a Java scheduler which uses threads rather than child processes
to do its work.

Due to the lack of engagement overall, I think I'll probably just keep my
existing design, since it works and works quite well.

David Cook
Systems Librarian
Prosentient Systems
72/330 Wattle St
Ultimo, NSW 2007
Australia

Office: <a href="tel:02%2092%2012%2008%2099" value="+33292120899" target="_blank">02 9212 0899
Direct: <a href="tel:02%2080%2005%2005%2095" value="+33280050595" target="_blank">02 8005 0595


> -----Original Message-----
> From: Tajoli Zeno [mailto:[hidden email]]
> Sent: Wednesday, 22 February 2017 7:49 PM
> To: David Cook <[hidden email]>; [hidden email]
> community.org
> Subject: Re: [Koha-devel] Task schedulers and message queues for Koha
>
> Hi David and all,
>
> Il 21/02/2017 23:29, David Cook ha scritto:
> >. Two, they wanted to
> > execute OAI-PMH requests every 2-3 seconds and cron has 1 minute as
> >its  finest granularity. Three, even if you setup a cronjob to run
> >every minute,  long running tasks could get duplicated (although you
> >could mitigate that  with locks which would be a pain). Plus, you want
> >to run tasks in parallel,  so you're going to want to use multiple
> >processes, which cron isn't really  set up to achieve.
>
> Ok, if you need those features cron isn't enough.
> But why do you drop the option Celery +  RabbitMQ + AnyEvent::RabbitMQ
>
> They have official debiano packages:
> https://packages.debian.org/jessie/python-celery
> https://packages.debian.org/jessie/rabbitmq-server
> https://packages.debian.org/jessie/libanyevent-rabbitmq-perl
>
> We still use one of their dpendencies for similar tasks (libanyevent-perl
"event
> loop framework with multiple implementations").
>
> Python is already present in our Debian/Ubuntu system, it is a prereq of
the

> distributions.
>
> Redone a so complex stack in perl i think is very complex.
>
> Bye
> Zeno Tajoli
>
>
>
> --
> Zeno Tajoli
> /SVILUPPO PRODOTTI CINECA/ - Automazione Biblioteche
> Email: [hidden email] Fax: 051/6132198
> *CINECA* Consorzio Interuniversitario - Sede operativa di Segrate (MI)


_______________________________________________
Koha-devel mailing list
[hidden email]
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-devel
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/

--

Tomás Cohen Arazi

Theke Solutions (https://theke.io)
<a href="tel:+54%209%20351%20351-3384" value="+5493513513384" target="_blank">+54 9351 3513384
GPG: B2F3C15F

_______________________________________________
Koha-devel mailing list
[hidden email]
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-devel
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/

_______________________________________________
Koha-devel mailing list
[hidden email]
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-devel
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Task schedulers and message queues for Koha

David Cook

Hi Jonathan,

 

I have scrapped the work that I was doing on a generic task scheduler. Instead, I’ve developed a daemon which just works for OAI-PMH harvesting. Feel free to go ahead with whatever you’re planning, and I’m happy to contribute ideas.

 

I was concerned that I was being overly ambitious, and that the project would never be accepted into Koha. A third-party message queue like RabbitMQ would add another dependency to Koha, which would further complicate installations and maintainance, although I think it might still be the best way forward. Or using something like ZeroMQ to set up our own using some established work. I had written my own message queue in Perl, which was fairly easy to do, so that’s always an alternative. For the task scheduler, I used POE for the event framework and used timers to schedule tasks. Initially I didn’t use a message queue, but I think using one would be more optimal. You fire up some workers, register them with the message queue, and then they consume messages as the message queue assigns them. The task scheduler would then just be used for initially queuing the messages into the queue for the workers to consume.

 

With the OAI-PMH daemon, which I’d like to post ASAP to 10662, I’m still using POE for the event framework, but I’m using POE::Component::JobQueue to handle the queue. I have a queue for downloading and a queue for importing. Each queue has X workers which run in parallel. At the moment, I’m forking the workers, since it was the easiest thing to do, but it is a little bit heavy. Not in terms of the overhead of forking, which is fairly non-existent really, but since you’re getting a copy of the harvester for each forked worker, the resources seem to add up a bit. At the moment, my design is for a single Koha system, or one with a lot of resources. Anyway, so the Koha web interface connects to the OAI-PMH harvester daemon using a UNIX socket. In koha-conf.xml, I have a line pointing to a configuration file, and in there is a socket address. It uses a super simple protocol serialised in JSON with null terminated lines to submit/list/start/stop/delete jobs in the harvester. The harvester downloads records to the file system and adds a pointer to the database, and then the importer job queue assigns a database entry to each of its workers and imports the records into Koha.

 

I was thinking that even with a task scheduler and message queue, I’d probably still implement the OAI-PMH harvester as I have. Maybe I could replace the UNIX socket connection with the message queue, so the harvester consumes messages from the queue rather than the client, but it’s a bit academic. The harvester needs to have direct control over its workers rather than the queue sending messages to the workers, so that it can control the jobs directly. I’m not a huge fan of how the Python-based Celery scheduler manages cancelled jobs, although I found Celery to be a neat piece of work.

 

Anyway, long story short, no real news. I’ve abandoned making a generic task scheduler and message queue, and just made my to-purpose daemon which implements its own internal queue management for the sake of simplicity and efficacy.

 

David Cook

Systems Librarian

Prosentient Systems

72/330 Wattle St

Ultimo, NSW 2007

Australia

 

Office: 02 9212 0899

Direct: 02 8005 0595

 

From: [hidden email] [mailto:[hidden email]] On Behalf Of Jonathan Druart
Sent: Wednesday, 26 April 2017 2:21 AM
To: [hidden email]
Subject: Re: [Koha-devel] Task schedulers and message queues for Koha

 

On Thu, 23 Feb 2017 at 00:51 David Cook <[hidden email]> wrote:

I’m planning to post the code for what I have already in early March.

 

Any news here?

We really need to remove the way our background jobs are implemented to make them work under Plack.

I'd like to avoid duplication of work...

 

 David Cook

Systems Librarian

Prosentient Systems

72/330 Wattle St

Ultimo, NSW 2007

Australia

 

Office: <a href="tel:02%2092%2012%2008%2099" target="_blank">02 9212 0899

Direct: <a href="tel:02%2080%2005%2005%2095" target="_blank">02 8005 0595

 

From: Tomas Cohen Arazi [mailto:[hidden email]]
Sent: Thursday, 23 February 2017 2:16 PM
To: David Cook <[hidden email]>; Tajoli Zeno <[hidden email]>; [hidden email]


Subject: Re: [Koha-devel] Task schedulers and message queues for Koha

 

Share it :-)

 

El mié., 22 de feb. de 2017 9:57 PM, David Cook <[hidden email]> escribió:

Hi Zeno,

I have a number of concerns about Celery. One of those is that it would add
numerous external dependencies and complexity to Koha implementations.

Your suggestion of Celery + RabbitMQ + AnyEvent::RabbitMQ sounds ok,
although it would involve work too. While Celery clients exist for PHP and
Node.js, we'd need to create a Perl implementation of the Celery protocol
using AnyEvent::RabbitMQ (or Net::RabbitFoot). Not that I'm necessarily
opposed to that.

We'd also still need to write the tasks in Python (or use web hooks which
would have the overhead of HTTP plus you'd have to worry about your web
server being up). I'm not sure how keen the community at large is to support
more server-side languages. I like writing Python, so I don't mind porting
over my OAI-PMH code from Perl to Python. I've abandoned the HTTP::OAI
module anyway for a few reasons.

RabbitMQ is a pretty heavy duty product as well which comes with its own
requirements: https://www.rabbitmq.com/production-checklist.html. While we
currently help people with Apache, MySQL, Zebra, and ElasticSearch, we'd
also all need to become experts with RabbitMQ.

I've already put together a Perl-based scheduler using POE which forks its
own workers. And I've already put together a basic Perl-based message queue
which sends events to pre-existing workers (like Celery). Celery with
RabbitMQ is more mature and complex, but my Perl programs do the trick.

Looking at DSpace's OAI-PMH harvester, it works very much like my first
design. It's a Java scheduler which uses threads rather than child processes
to do its work.

Due to the lack of engagement overall, I think I'll probably just keep my
existing design, since it works and works quite well.

David Cook
Systems Librarian
Prosentient Systems
72/330 Wattle St
Ultimo, NSW 2007
Australia

Office: <a href="tel:02%2092%2012%2008%2099" target="_blank">02 9212 0899
Direct: <a href="tel:02%2080%2005%2005%2095" target="_blank">02 8005 0595


> -----Original Message-----
> From: Tajoli Zeno [mailto:[hidden email]]
> Sent: Wednesday, 22 February 2017 7:49 PM
> To: David Cook <[hidden email]>; [hidden email]
> community.org
> Subject: Re: [Koha-devel] Task schedulers and message queues for Koha
>
> Hi David and all,
>
> Il 21/02/2017 23:29, David Cook ha scritto:
> >. Two, they wanted to
> > execute OAI-PMH requests every 2-3 seconds and cron has 1 minute as
> >its  finest granularity. Three, even if you setup a cronjob to run
> >every minute,  long running tasks could get duplicated (although you
> >could mitigate that  with locks which would be a pain). Plus, you want
> >to run tasks in parallel,  so you're going to want to use multiple
> >processes, which cron isn't really  set up to achieve.
>
> Ok, if you need those features cron isn't enough.
> But why do you drop the option Celery +  RabbitMQ + AnyEvent::RabbitMQ
>
> They have official debiano packages:
> https://packages.debian.org/jessie/python-celery
> https://packages.debian.org/jessie/rabbitmq-server
> https://packages.debian.org/jessie/libanyevent-rabbitmq-perl
>
> We still use one of their dpendencies for similar tasks (libanyevent-perl
"event
> loop framework with multiple implementations").
>
> Python is already present in our Debian/Ubuntu system, it is a prereq of
the

> distributions.
>
> Redone a so complex stack in perl i think is very complex.
>
> Bye
> Zeno Tajoli
>
>
>
> --
> Zeno Tajoli
> /SVILUPPO PRODOTTI CINECA/ - Automazione Biblioteche
> Email: [hidden email] Fax: 051/6132198
> *CINECA* Consorzio Interuniversitario - Sede operativa di Segrate (MI)


_______________________________________________
Koha-devel mailing list
[hidden email]
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-devel
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/

--

Tomás Cohen Arazi

Theke Solutions (https://theke.io)
<a href="tel:+54%209%20351%20351-3384" target="_blank">+54 9351 3513384
GPG: B2F3C15F

_______________________________________________
Koha-devel mailing list
[hidden email]
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-devel
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/


_______________________________________________
Koha-devel mailing list
[hidden email]
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-devel
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/
Loading...