From patchwork Fri Jul 21 14:11:01 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Colin Leroy-Mira X-Patchwork-Id: 73048 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 69B98385DC16 for ; Fri, 21 Jul 2023 14:11:45 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 69B98385DC16 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1689948705; bh=SgT8S8Zz6Qaqhc91m0hQZyi4urremSkIKgmch/HbLdE=; h=To:Cc:Subject:Date:In-Reply-To:References:List-Id: List-Unsubscribe:List-Archive:List-Post:List-Help:List-Subscribe: From:Reply-To:From; b=t3KN3Nptl7gUZjwXy4tRTnnMSD4onTzUpVhx4X6nCLNXiqr2iW4zcRn7XUQN1S35a tM2aGZBPCcHgawSn0GuasonUie84OHoLoByl7hpHr4ZDsZNCWeoCm6PgfvlceM+Nhv Oy+qWLw00mMmbbWxDdIpT1PBFIMwlkX+S61/w92c= X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from srv.colino.net (srv.colino.net [212.83.157.151]) by sourceware.org (Postfix) with ESMTPS id B3FFB3858409 for ; Fri, 21 Jul 2023 14:11:21 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org B3FFB3858409 To: libc-alpha@sourceware.org Cc: Colin Leroy-Mira Subject: [PATCH v2] localedata: Translit common emojis to smileys [BZ #30649] Date: Fri, 21 Jul 2023 16:11:01 +0200 Message-Id: <20230721141101.3337118-1-colin@colino.net> In-Reply-To: <20230719161707.1558085-1-colin@colino.net> References: <20230719161707.1558085-1-colin@colino.net> MIME-Version: 1.0 X-Spamd-Bar: ----- X-Spam-Status: No, score=-11.9 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Colin Leroy-Mira via Libc-alpha From: Colin Leroy-Mira Reply-To: Colin Leroy-Mira Errors-To: libc-alpha-bounces+patchwork=sourceware.org@sourceware.org Sender: "Libc-alpha" Add common emojis to the translit-able characters (mostly faces and hearts), and translit them to old-fashioned smileys. Author: Colin Leroy-Mira Signed-off-by: Colin Leroy-Mira --- v2: Fix a wrong smiley, add unit test localedata/Makefile | 3 + localedata/locales/translit_emojis | 91 ++++++++++++++++++++ localedata/locales/translit_neutral | 1 + localedata/tst-iconv-emojis-trans.c | 124 ++++++++++++++++++++++++++++ 4 files changed, 219 insertions(+) create mode 100644 localedata/locales/translit_emojis create mode 100644 localedata/tst-iconv-emojis-trans.c diff --git a/localedata/Makefile b/localedata/Makefile index 3619b6d47e..5b6d10e33f 100644 --- a/localedata/Makefile +++ b/localedata/Makefile @@ -164,6 +164,7 @@ tests = \ bug-usesetlocale \ tst-c-utf8-consistency \ tst-digits \ + tst-iconv-emojis-trans \ tst-iconv-math-trans \ tst-leaks \ tst-mbswcs1 \ @@ -320,6 +321,8 @@ LOCALES := \ include ../gen-locales.mk +$(objpfx)tst-iconv-emojis-trans.out: $(gen-locales) + $(objpfx)tst-iconv-math-trans.out: $(gen-locales) endif diff --git a/localedata/locales/translit_emojis b/localedata/locales/translit_emojis new file mode 100644 index 0000000000..260aeedc35 --- /dev/null +++ b/localedata/locales/translit_emojis @@ -0,0 +1,91 @@ +escape_char / +comment_char % + +% This file is part of the GNU C Library and contains locale data. +% The Free Software Foundation does not claim any copyright interest +% in the locale data contained in this file. The foregoing does not +% affect the license of the GNU C Library as a whole. It does not +% exempt you from the conditions of the license if your use would +% otherwise be governed by that license. + +% Transliterations of emojis to ASCII smileys. +% Generated algorithmically. + +LC_CTYPE + +translit_start + + "" % WHITE HEART SUIT + "" % BLACK HEART SUIT + "" % HEAVY BLACK HEART + "" % BLUE HEART + "" % BEATING HEART + "" % BROKEN HEART + "" % SPARKLING HEART + "" % GROWING HEART + "" % GREEN HEART + "" % YELLOW HEART + "" % PURPLE HEART + "" % BLACK HEART + "" % ORANGE HEART + "" % WHITE HEART + "" % BROWN HEART + "" % GRINNING FACE + "" % GRINNING FACE WITH SMILING EYES + "" % FACE WITH TEARS OF JOY + "" % SMILING FACE WITH OPEN MOUTH (C.F. ☺) + "" % SMILING FACE WITH OPEN MOUTH AND SMILING EYES + "" % SMILING FACE WITH OPEN MOUTH AND COLD SWEAT + "" % SMILING FACE WITH OPEN MOUTH AND TIGHTLY-CLOSED EYES + "" % SMILING FACE WITH HALO + "" % SMILING FACE WITH HORNS + "" % WINKING FACE + "" % SMILING FACE WITH SMILING EYES + "" % FACE SAVOURING DELICIOUS FOOD + "" % RELIEVED FACE + "" % SMILING FACE WITH HEART-SHAPED EYES + "" % SMILING FACE WITH SUNGLASSES + "" % SMIRKING FACE + "" % NEUTRAL FACE + "" % EXPRESSIONLESS FACE + "" % UNAMUSED FACE + "" % FACE WITH COLD SWEAT + "" % PENSIVE FACE + "" % CONFUSED FACE + "" % CONFOUNDED FACE + "" % KISSING FACE + "" % FACE THROWING A KISS + "" % KISSING FACE WITH SMILING EYES + "" % KISSING FACE WITH CLOSED EYES + "" % FACE WITH STUCK-OUT TONGUE + "" % FACE WITH STUCK-OUT TONGUE AND WINKING EYE + "" % FACE WITH STUCK-OUT TONGUE AND TIGHTLY-CLOSED EYES + "" % DISAPPOINTED FACE + "" % WORRIED FACE + "" % ANGRY FACE + "" % POUTING FACE + "" % CRYING FACE + "" % PERSEVERING FACE + "" % FROWNING FACE WITH OPEN MOUTH + "" % ANGUISHED FACE + "" % FEARFUL FACE + "" % WEARY FACE + "" % LOUDLY CRYING FACE + "" % FACE WITH OPEN MOUTH + "" % HUSHED FACE + "" % FACE WITH OPEN MOUTH AND COLD SWEAT + "" % FACE SCREAMING IN FEAR + "" % ASTONISHED FACE + "" % GRINNING CAT FACE WITH SMILING EYES + "" % CAT FACE WITH TEARS OF JOY + "" % SMILING CAT FACE WITH OPEN MOUTH + "" % SMILING CAT FACE WITH HEART-SHAPE EYES + "" % CAT FACE WITH WRY SMILE + "" % KISSING CAT FACE WITH CLOSED EYES + "" % SLIGHTLY FROWNING FACE + "" % SLIGHTLY SMILING FACE + "" % UPSIDE-DOWN FACE + +translit_end + +END LC_CTYPE diff --git a/localedata/locales/translit_neutral b/localedata/locales/translit_neutral index 72f66220b7..57412ae565 100644 --- a/localedata/locales/translit_neutral +++ b/localedata/locales/translit_neutral @@ -17,6 +17,7 @@ translit_start include "translit_circle";"" include "translit_cjk_compat";"" include "translit_compat";"" +include "translit_emojis";"" include "translit_font";"" include "translit_fraction";"" include "translit_narrow";"" diff --git a/localedata/tst-iconv-emojis-trans.c b/localedata/tst-iconv-emojis-trans.c new file mode 100644 index 0000000000..89a32074d5 --- /dev/null +++ b/localedata/tst-iconv-emojis-trans.c @@ -0,0 +1,124 @@ +/* Test some emoji transliterations + + Copyright (C) 2019-2023 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#include +#include +#include +#include +#include + +static int +do_test (void) +{ + iconv_t cd; + + const int num_emojis = 70; + + const char str[] = "\u2661 \u2665 \u2764 \U0001F499 " + "\U0001F493 \U0001F494 \U0001F496 " + "\U0001F497 \U0001F49A \U0001F49B " + "\U0001F49C \U0001F5A4 \U0001F9E1 " + "\U0001F90D \U0001F90E \U0001F600 " + "\U0001F601 \U0001F602 \U0001F603 " + "\U0001F604 \U0001F605 \U0001F606 " + "\U0001F607 \U0001F608 \U0001F609 " + "\U0001F60A \U0001F60B \U0001F60C " + "\U0001F60D \U0001F60E \U0001F60F " + "\U0001F610 \U0001F611 \U0001F612 " + "\U0001F613 \U0001F614 \U0001F615 " + "\U0001F616 \U0001F617 \U0001F618 " + "\U0001F619 \U0001F61A \U0001F61B " + "\U0001F61C \U0001F61D \U0001F61E " + "\U0001F61F \U0001F620 \U0001F621 " + "\U0001F622 \U0001F623 \U0001F626 " + "\U0001F627 \U0001F628 \U0001F629 " + "\U0001F62D \U0001F62E \U0001F62F " + "\U0001F630 \U0001F631 \U0001F632 " + "\U0001F638 \U0001F639 \U0001F63A " + "\U0001F63B \U0001F63C \U0001F63D " + "\U0001F641 \U0001F642 \U0001F643"; + + const char expected[] = "<3 <3 <3 <3 <3 " + ":) ;-) " + ":-) :-P :-) :-* B-) " + ";-) :-| :-| :-| :'-| " + ":-| :-/ :-S :-* :-* " + ":-* :-* :-P ;-P X-P " + ":-( :-( >:-( :-( :'-( " + "X-( :-O :-O :-O :-O " + ":\"-( :-O :-O :'-O :-O " + ":-O :-3 :'-3 :-3 :-3 " + ";-3 :-3 :-( :-) (-:"; + + char *inptr = (char *) str; + size_t inlen = strlen (str) + 1; + char outbuf[500]; + char *outptr = outbuf; + size_t outlen = sizeof (outbuf); + int result = 0; + size_t n; + + if (setlocale (LC_ALL, "en_US.UTF-8") == NULL) + FAIL_EXIT1 ("setlocale failed"); + + cd = iconv_open ("ASCII//TRANSLIT", "UTF-8"); + if (cd == (iconv_t) -1) + FAIL_EXIT1 ("iconv_open failed"); + + n = iconv (cd, &inptr, &inlen, &outptr, &outlen); + if (n != num_emojis) + { + if (n == (size_t) -1) + printf ("iconv() returned error: %m\n"); + else + printf ("iconv() returned %zd, expected %d\n", n, num_emojis); + result = 1; + } + if (inlen != 0) + { + puts ("not all input consumed"); + result = 1; + } + else if (inptr - str != strlen (str) + 1) + { + printf ("inptr wrong, advanced by %td\n", inptr - str); + result = 1; + } + if (memcmp (outbuf, expected, sizeof (expected)) != 0) + { + printf ("result wrong: \"%.*s\", expected: \"%s\"\n", + (int) (sizeof (outbuf) - outlen), outbuf, expected); + result = 1; + } + else if (outlen != sizeof (outbuf) - sizeof (expected)) + { + printf ("outlen wrong: %zd, expected %zd\n", outlen, + sizeof (outbuf) - sizeof (expected)); + result = 1; + } + else + printf ("output is \"%s\" which is OK\n", outbuf); + + return result; +} + +#include