[Bug 28604] New: bad encoding when using marc-in-json

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

[Bug 28604] New: bad encoding when using marc-in-json

bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=28604

            Bug ID: 28604
           Summary: bad encoding when using marc-in-json
 Change sponsored?: ---
           Product: Koha
           Version: master
          Hardware: All
                OS: All
            Status: NEW
          Severity: normal
          Priority: P5 - low
         Component: REST API
          Assignee: [hidden email]
          Reporter: [hidden email]

There are some UTF-8 encoding problems when using marc-in-json that have
diacritics in them, I encountered this when using the '/api/v1/public/biblios/'
API route.

To recreate:
-Add some diacritics to a MARC field, I am using the 538$a and added the note
'Tést nöte'.
-Look at the what the API returns, I used this:

curl --location --request GET 'http://localhost:8080/api/v1/public/biblios/144'
\
--header 'Accept: application/marc-in-json'

The 538$a note comes out looking like this:

Tést nöte

--
You are receiving this mail because:
You are watching all bug changes.
You are the assignee for the bug.
_______________________________________________
Koha-bugs mailing list
[hidden email]
https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/
Reply | Threaded
Open this post in threaded view
|

[Bug 28604] bad encoding when using marc-in-json

bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=28604

Tomás Cohen Arazi <[hidden email]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Assignee|[hidden email]-commun |[hidden email]
                   |ity.org                     |
             Status|NEW                         |Needs Signoff
                 CC|                            |[hidden email]

--
You are receiving this mail because:
You are the assignee for the bug.
You are watching all bug changes.
_______________________________________________
Koha-bugs mailing list
[hidden email]
https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/
Reply | Threaded
Open this post in threaded view
|

[Bug 28604] bad encoding when using marc-in-json

bugzilla-daemon
In reply to this post by bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=28604

--- Comment #1 from Tomás Cohen Arazi <[hidden email]> ---
Created attachment 122188
  -->
https://bugs.koha-community.org/bugzilla3/attachment.cgi?id=122188&action=edit
Bug 28604: Regression tests

This patch introduces regression tests for the encoding issue with MiJ
output.

Signed-off-by: Tomas Cohen Arazi <[hidden email]>

--
You are receiving this mail because:
You are watching all bug changes.
_______________________________________________
Koha-bugs mailing list
[hidden email]
https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/
Reply | Threaded
Open this post in threaded view
|

[Bug 28604] bad encoding when using marc-in-json

bugzilla-daemon
In reply to this post by bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=28604

--- Comment #2 from Tomás Cohen Arazi <[hidden email]> ---
Created attachment 122189
  -->
https://bugs.koha-community.org/bugzilla3/attachment.cgi?id=122189&action=edit
Bug 28604: Prevent double encoding of MARC::Record::MiJ->to_mij output

This patch fixes a double-encoding issue with MiJ output.

Mojolicious' *text* renderer encodes the passed information according to
the request context. [1]

MARC::Record::MiJ->to_mij, conveniently encodes the string before
output [2].

This causes double encoding.

So the solution to this situation, is to use the *data* renderer, which
doesn't perform any encoding [3].

To test:
1. Apply the regression tests patch
2. Run:
   $ kshell
  k$ prove t/db_dependent/api/v1/biblios.t
=> FAIL: Tests contain diacritics and fail!
3. Have a record with diacritics
4. Try the API routes for fetching a biblio:
   $ curl --location --request GET
'http://localhost:8080/api/v1/public/biblios/144' \
          --header 'Accept: application/marc-in-json'
   (replace the record id with the one you've chosen)
=> FAIL: Boo, double encoding
5. Bonus point: you can try it on the non-public route, but you need
   more configuration boilerplate (basic auth, permissions). If you look
   at the fix, you will understand the tests cover it and no need to
   complicate yourself.
6. Apply this patch
7. Repeat 2
=> SUCCESS: Tests pass!
8. Repeat 4 (and maybe 5)
=> SUCCESS: No double encoding! Yay!
9. Sign off :-D

[1]
https://metacpan.org/release/MRAMBERG/Convos-0.5/view/local/lib/perl5/Mojolicious/Guides/Rendering.pod#Rendering-text
[2] https://metacpan.org/dist/MARC-File-MiJ/source/lib/MARC/Record/MiJ.pm#L111
[3]
https://metacpan.org/release/MRAMBERG/Convos-0.5/view/local/lib/perl5/Mojolicious/Guides/Rendering.pod#Rendering-data

Signed-off-by: Tomas Cohen Arazi <[hidden email]>

--
You are receiving this mail because:
You are watching all bug changes.
_______________________________________________
Koha-bugs mailing list
[hidden email]
https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/
Reply | Threaded
Open this post in threaded view
|

[Bug 28604] Bad encoding when using marc-in-json

bugzilla-daemon
In reply to this post by bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=28604

Tomás Cohen Arazi <[hidden email]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
            Summary|bad encoding when using     |Bad encoding when using
                   |marc-in-json                |marc-in-json

--
You are receiving this mail because:
You are watching all bug changes.
_______________________________________________
Koha-bugs mailing list
[hidden email]
https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/
Reply | Threaded
Open this post in threaded view
|

[Bug 28604] Bad encoding when using marc-in-json

bugzilla-daemon
In reply to this post by bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=28604

Lucas Gass <[hidden email]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|Needs Signoff               |Signed Off

--
You are receiving this mail because:
You are watching all bug changes.
_______________________________________________
Koha-bugs mailing list
[hidden email]
https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/
Reply | Threaded
Open this post in threaded view
|

[Bug 28604] Bad encoding when using marc-in-json

bugzilla-daemon
In reply to this post by bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=28604

--- Comment #3 from Lucas Gass <[hidden email]> ---
Created attachment 122190
  -->
https://bugs.koha-community.org/bugzilla3/attachment.cgi?id=122190&action=edit
Bug 28604: Regression tests

This patch introduces regression tests for the encoding issue with MiJ
output.

Signed-off-by: Tomas Cohen Arazi <[hidden email]>

Signed-off-by: Lucas Gass <[hidden email]>

--
You are receiving this mail because:
You are watching all bug changes.
_______________________________________________
Koha-bugs mailing list
[hidden email]
https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/
Reply | Threaded
Open this post in threaded view
|

[Bug 28604] Bad encoding when using marc-in-json

bugzilla-daemon
In reply to this post by bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=28604

Lucas Gass <[hidden email]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
 Attachment #122189|0                           |1
        is obsolete|                            |

--- Comment #4 from Lucas Gass <[hidden email]> ---
Created attachment 122191
  -->
https://bugs.koha-community.org/bugzilla3/attachment.cgi?id=122191&action=edit
Bug 28604: Prevent double encoding of MARC::Record::MiJ->to_mij output

This patch fixes a double-encoding issue with MiJ output.

Mojolicious' *text* renderer encodes the passed information according to
the request context. [1]

MARC::Record::MiJ->to_mij, conveniently encodes the string before
output [2].

This causes double encoding.

So the solution to this situation, is to use the *data* renderer, which
doesn't perform any encoding [3].

To test:
1. Apply the regression tests patch
2. Run:
   $ kshell
  k$ prove t/db_dependent/api/v1/biblios.t
=> FAIL: Tests contain diacritics and fail!
3. Have a record with diacritics
4. Try the API routes for fetching a biblio:
   $ curl --location --request GET
'http://localhost:8080/api/v1/public/biblios/144' \
          --header 'Accept: application/marc-in-json'
   (replace the record id with the one you've chosen)
=> FAIL: Boo, double encoding
5. Bonus point: you can try it on the non-public route, but you need
   more configuration boilerplate (basic auth, permissions). If you look
   at the fix, you will understand the tests cover it and no need to
   complicate yourself.
6. Apply this patch
7. Repeat 2
=> SUCCESS: Tests pass!
8. Repeat 4 (and maybe 5)
=> SUCCESS: No double encoding! Yay!
9. Sign off :-D

[1]
https://metacpan.org/release/MRAMBERG/Convos-0.5/view/local/lib/perl5/Mojolicious/Guides/Rendering.pod#Rendering-text
[2] https://metacpan.org/dist/MARC-File-MiJ/source/lib/MARC/Record/MiJ.pm#L111
[3]
https://metacpan.org/release/MRAMBERG/Convos-0.5/view/local/lib/perl5/Mojolicious/Guides/Rendering.pod#Rendering-data

Signed-off-by: Tomas Cohen Arazi <[hidden email]>

Signed-off-by: Lucas Gass <[hidden email]>

--
You are receiving this mail because:
You are watching all bug changes.
_______________________________________________
Koha-bugs mailing list
[hidden email]
https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/
Reply | Threaded
Open this post in threaded view
|

[Bug 28604] Bad encoding when using marc-in-json

bugzilla-daemon
In reply to this post by bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=28604

Lucas Gass <[hidden email]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
 Attachment #122188|0                           |1
        is obsolete|                            |

--
You are receiving this mail because:
You are watching all bug changes.
_______________________________________________
Koha-bugs mailing list
[hidden email]
https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/