Re: [Koha] need help in zebra indexing for Arabic words

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Re: [Koha] need help in zebra indexing for Arabic words

zalabany
Hi karam
I think this issue is very important to all users who use Arabic language
We need to know what the important modifications in Zebra index And koha files to index the arabic words without suffix and prifex
or searsh with root word using zebra

I will give you example
We need to search for word (al-madrasah) this mean school
We have tow parts (al) mean (the)and (madrasah ) and the last caracter (h) may have tow forms( ه ة ) we need when we search for madrasa get madrasah and al-madrasah with all its forms
Sent from my HTC


----- Reply message -----
From: "Karam Qubsi" <[hidden email]>
To: "Koha" <[hidden email]>, <[hidden email]>
Subject: [Koha] need help in zebra indexing for Arabic words
Date: Thu, Oct 25, 2012 11:12 am


Hi All

I sent this email to the zebra mailing list

I wish that koha will support suffix and prefix in the arabic search by
default  in the future

I wish if you have any Idea about the solution for this

Best regards
Karam.

---------- Forwarded message ----------
From: Karam Qubsi <[hidden email]>
Date: Thu, Oct 25, 2012 at 4:20 AM
Subject: need help in transliterate rule
To: [hidden email]


Hi all ,
I'm new to zebra , and using it in "koha" ILS

in the Arabic language there is some letter come before the words or after
them these letters we don't need to index in our index
I will give you an example :
in English we can say : the car...
in Arabic the same is : السيارة
if you note that the letter in Red is "the" and it's ignored in English
search  but in Arabic search we need the zebra to find every record that
have " the word " or just "word" الكلمة or كلمة

so how to define that to zebra

thanks a lot
_______________________________________________
Koha mailing list  http://koha-community.org
[hidden email]
http://lists.katipo.co.nz/mailman/listinfo/koha

_______________________________________________
Koha-devel mailing list
[hidden email]
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-devel
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/
Reply | Threaded
Open this post in threaded view
|

Re: [Koha] need help in zebra indexing for Arabic words

Karam Qubsi
Thanks Mohamed for give more description about our same problem

I wish we can find a solution for this problem for all the users of koha in the Arab world

while I'm searching i find something about the " أ إ آ"  letters to considers them all as "ا " 
and I find something about the suffix " ة ه "

the solution is to edit the  words-icu.xml file and add these lines :

for أ آ إ :
  <transliterate rule="  [\u0623 \u0625 \u0622] > \u0627 " />

and for the ة ه   suffix add :
  <transliterate rule="ه > ة"/>

and for the ي ى  suffix add :
  <transliterate rule=" ى > ي "/>

 I didn't try this I wish you try and tell me

but about ignoring the الـ  letters in words i didn't find how to do this

I wish someone can help us about this

best regards


On Thu, Oct 25, 2012 at 5:39 AM, [hidden email] <[hidden email]> wrote:
Hi karam
I think this issue is very important to all users who use Arabic language
We need to know what the important modifications in Zebra index And koha files to index the arabic words without suffix and prifex
or searsh with root word using zebra

I will give you example
We need to search for word (al-madrasah) this mean school
We have tow parts (al) mean (the)and (madrasah ) and the last caracter (h) may have tow forms( ه ة ) we need when we search for madrasa get madrasah and al-madrasah with all its forms
Sent from my HTC



----- Reply message -----
From: "Karam Qubsi" <[hidden email]>
To: "Koha" <[hidden email]>, <[hidden email]>
Subject: [Koha] need help in zebra indexing for Arabic words
Date: Thu, Oct 25, 2012 11:12 am


Hi All

I sent this email to the zebra mailing list

I wish that koha will support suffix and prefix in the arabic search by
default  in the future

I wish if you have any Idea about the solution for this

Best regards
Karam.

---------- Forwarded message ----------
From: Karam Qubsi <[hidden email]>
Date: Thu, Oct 25, 2012 at 4:20 AM
Subject: need help in transliterate rule
To: [hidden email]


Hi all ,
I'm new to zebra , and using it in "koha" ILS

in the Arabic language there is some letter come before the words or after
them these letters we don't need to index in our index
I will give you an example :
in English we can say : the car...
in Arabic the same is : السيارة
if you note that the letter in Red is "the" and it's ignored in English
search  but in Arabic search we need the zebra to find every record that
have " the word " or just "word" الكلمة or كلمة

so how to define that to zebra

thanks a lot
_______________________________________________
Koha mailing list  http://koha-community.org
[hidden email]
http://lists.katipo.co.nz/mailman/listinfo/koha



--
Karam Qubsi
Arabic Koha Support Team
The founder of Wikibrary (The Arabic Libraries And Information Science Encyclopedia )

Skype : karam.qubsi
Mobile: +963 991 264 020
Viber / Tango : +963 991 264 020
Wikibrary website : http://wikibrary.org
Wiki and forum for Arabic Koha : http://koha.wikibrary.org
كرم قبسي.
كوها العربي ـ فريق الدعم
مؤسس ويكيمكتبات ( الموسوعة العربية للمكتبات والمعلومات ) 


_______________________________________________
Koha-devel mailing list
[hidden email]
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-devel
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/
Reply | Threaded
Open this post in threaded view
|

Re: [Koha] need help in zebra indexing for Arabic words

Frédéric Demians
In reply to this post by zalabany
No, you don't need help, you need to contract a developer to do the job.
All the Koha great features you're enjoying arrived into Koha this way.
If you could find in the Arabic world someone, let say a Gold Oil Emir,
prone to finance the project, it could be done in a flash.

Take a look at the process here:

http://wiki.koha-community.org/wiki/Category:RFCs

Kind regards,
--
Frédéric DEMIANS
http://www.tamil.fr/u/fdemians.html

_______________________________________________
Koha-devel mailing list
[hidden email]
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-devel
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/
Reply | Threaded
Open this post in threaded view
|

Re: [Koha] need help in zebra indexing for Arabic words

paul POULAIN-3
Le 25/10/2012 13:53, Frédéric Demians a écrit :
> No, you don't need help, you need to contract a developer to do the job.
What Frederic is explaining here is that you can't achieve this with the
current Koha. And I suspect it's not a koha problem, but a zebra/icu one.

Side comment = we're working on integration of a new search engine layer
(solr). Maybe solr will fix this problem ?

Anyway, we're looking for some funding for continuing the work on search
layer (see:
http://wiki.koha-community.org/w/index.php?title=C_%26_P_Search_Rewrite_RFC)


--
Paul POULAIN
http://www.biblibre.com
Expert en Logiciels Libres pour l'info-doc
Tel : (33) 4 91 81 35 08
_______________________________________________
Koha-devel mailing list
[hidden email]
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-devel
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/
Reply | Threaded
Open this post in threaded view
|

Re: [Koha] need help in zebra indexing for Arabic words

Karam Qubsi
Yes it's not a koha problem
but I think there is some people who fix this in zebra ( or maybe it's just some more options to add in zebra files )

Massoud Alshareef  from KnowledgeWare Technologies mention that they have do that and solve the problem
in : http://koha-community.org/category/koha-news/support-company-press/

I wish if he can help us in this (cc to him )

I heard about solr that it's very good but I didn't search about arabic support if better than zebra but I see this now :
http://wiki.apache.org/solr/LanguageAnalysis#Arabic

anyway thanks a lot and I will search more about that if I find some solution I will share it with you


best regards
Karam .


On Thu, Oct 25, 2012 at 8:02 AM, Paul Poulain <[hidden email]> wrote:
Le 25/10/2012 13:53, Frédéric Demians a écrit :
> No, you don't need help, you need to contract a developer to do the job.
What Frederic is explaining here is that you can't achieve this with the
current Koha. And I suspect it's not a koha problem, but a zebra/icu one.

Side comment = we're working on integration of a new search engine layer
(solr). Maybe solr will fix this problem ?

Anyway, we're looking for some funding for continuing the work on search
layer (see:
http://wiki.koha-community.org/w/index.php?title=C_%26_P_Search_Rewrite_RFC)


--
Paul POULAIN
http://www.biblibre.com
Expert en Logiciels Libres pour l'info-doc
Tel : (33) 4 91 81 35 08



_______________________________________________
Koha-devel mailing list
[hidden email]
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-devel
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/
Reply | Threaded
Open this post in threaded view
|

Re: [Koha] need help in zebra indexing for Arabic words

Karam Qubsi
Hi all
I solved this in zebra by customizing the transliterate rule  in  words-icu.xml file

I will share a complete file solve this in Arabic soon !

the solution is by adding the following : (for example ) : I will not use here the Arabic characters  to make it more simple :

if we have language X and in this language we write in connected letter but some letter is not important in the search process , so we have this word  " theword " in the search the searcher is not interested in finding the  but he is absolutely search for "word "

so I solve this by following this guide : http://userguide.icu-project.org/transforms/general/rules#TOC-Context

and make zebra convert thew to w
and we may have to make this for every letter thea to a _ theb to b >>>> thez to z

like in the following :
  <transliterate rule="{ thea > a "/>
  <transliterate rule="{ thew > w "/>
...
...
..
  <transliterate rule="{ thez > z "/>
so if some one search for theword the zebra will convert thew to w so searching for word = theword :D

and for Arabic :
  <transliterate rule="{ الا > ا "/>
  <transliterate rule="{ الب > ب "/>
.....
...
...
..

  <transliterate rule="{ الي > ي "/>
so searching for  " بحث"
will find  "البحث"

and this will solve the whole problem :)
I wish this will help you Mohamed

Thank you Frédéric , Paul

Karam


On Thu, Oct 25, 2012 at 9:23 AM, Karam Qubsi <[hidden email]> wrote:
Yes it's not a koha problem
but I think there is some people who fix this in zebra ( or maybe it's just some more options to add in zebra files )

Massoud Alshareef  from KnowledgeWare Technologies mention that they have do that and solve the problem
in : http://koha-community.org/category/koha-news/support-company-press/

I wish if he can help us in this (cc to him )

I heard about solr that it's very good but I didn't search about arabic support if better than zebra but I see this now :
http://wiki.apache.org/solr/LanguageAnalysis#Arabic

anyway thanks a lot and I will search more about that if I find some solution I will share it with you


best regards
Karam .



On Thu, Oct 25, 2012 at 8:02 AM, Paul Poulain <[hidden email]> wrote:
Le 25/10/2012 13:53, Frédéric Demians a écrit :
> No, you don't need help, you need to contract a developer to do the job.
What Frederic is explaining here is that you can't achieve this with the
current Koha. And I suspect it's not a koha problem, but a zebra/icu one.

Side comment = we're working on integration of a new search engine layer
(solr). Maybe solr will fix this problem ?

Anyway, we're looking for some funding for continuing the work on search
layer (see:
http://wiki.koha-community.org/w/index.php?title=C_%26_P_Search_Rewrite_RFC)


--
Paul POULAIN
http://www.biblibre.com
Expert en Logiciels Libres pour l'info-doc
Tel : (33) 4 91 81 35 08






_______________________________________________
Koha-devel mailing list
[hidden email]
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-devel
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/
Reply | Threaded
Open this post in threaded view
|

Re: [Koha] need help in zebra indexing for Arabic words

Karam Qubsi
if any one use the attached file for the Arabic koha
he will not find problem that we talked about :)  ( the prefix ) - I will develop it some way to support suffix (in arabic )

copy and past the file in this target :
/etc/koha/zebradb/etc/

Best regards
karam

On Thu, Oct 25, 2012 at 6:58 PM, Karam Qubsi <[hidden email]> wrote:
Hi all
I solved this in zebra by customizing the transliterate rule  in  words-icu.xml file

I will share a complete file solve this in Arabic soon !

the solution is by adding the following : (for example ) : I will not use here the Arabic characters  to make it more simple :

if we have language X and in this language we write in connected letter but some letter is not important in the search process , so we have this word  " theword " in the search the searcher is not interested in finding the  but he is absolutely search for "word "

so I solve this by following this guide : http://userguide.icu-project.org/transforms/general/rules#TOC-Context

and make zebra convert thew to w
and we may have to make this for every letter thea to a _ theb to b >>>> thez to z

like in the following :
  <transliterate rule="{ thea > a "/>
  <transliterate rule="{ thew > w "/>
...
...
..
  <transliterate rule="{ thez > z "/>
so if some one search for theword the zebra will convert thew to w so searching for word = theword :D

and for Arabic :
  <transliterate rule="{ الا > ا "/>
  <transliterate rule="{ الب > ب "/>
.....
...
...
..

  <transliterate rule="{ الي > ي "/>
so searching for  " بحث"
will find  "البحث"

and this will solve the whole problem :)
I wish this will help you Mohamed

Thank you Frédéric , Paul

Karam


On Thu, Oct 25, 2012 at 9:23 AM, Karam Qubsi <[hidden email]> wrote:
Yes it's not a koha problem
but I think there is some people who fix this in zebra ( or maybe it's just some more options to add in zebra files )

Massoud Alshareef  from KnowledgeWare Technologies mention that they have do that and solve the problem
in : http://koha-community.org/category/koha-news/support-company-press/

I wish if he can help us in this (cc to him )

I heard about solr that it's very good but I didn't search about arabic support if better than zebra but I see this now :
http://wiki.apache.org/solr/LanguageAnalysis#Arabic

anyway thanks a lot and I will search more about that if I find some solution I will share it with you


best regards
Karam .



On Thu, Oct 25, 2012 at 8:02 AM, Paul Poulain <[hidden email]> wrote:
Le 25/10/2012 13:53, Frédéric Demians a écrit :
> No, you don't need help, you need to contract a developer to do the job.
What Frederic is explaining here is that you can't achieve this with the
current Koha. And I suspect it's not a koha problem, but a zebra/icu one.

Side comment = we're working on integration of a new search engine layer
(solr). Maybe solr will fix this problem ?

Anyway, we're looking for some funding for continuing the work on search
layer (see:
http://wiki.koha-community.org/w/index.php?title=C_%26_P_Search_Rewrite_RFC)


--
Paul POULAIN
http://www.biblibre.com
Expert en Logiciels Libres pour l'info-doc
Tel : (33) 4 91 81 35 08





_______________________________________________
Koha-devel mailing list
[hidden email]
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-devel
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/

words-icu.xml (2K) Download Attachment