From patchwork Wed Apr 28 13:00:29 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Carlos O'Donell X-Patchwork-Id: 43184 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 319753938C33; Wed, 28 Apr 2021 13:00:52 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 319753938C33 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1619614852; bh=kOni6XboZkv7dVyJ78dVUb3Hat1aKTRu4KXbrySL04Y=; h=To:Subject:Date:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:From; b=ML5Ga+kVRiHRvym7/0MHsSmH/HUfTM69m5/xJ6BDsvuG9KKc6HK4d176OeqnfWbM1 mZPD/VPjt9gJ/P84AGVZu6wbwFFqV5yS7bzeKZPVcCRyqAZ/AbxjL3tZCVNo6ogjwe mQfyd4JC6gx2Pn53AHiWyn350rRoVRIBeuWCKaZM= X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by sourceware.org (Postfix) with ESMTP id 7F0A3385701E for ; Wed, 28 Apr 2021 13:00:48 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 7F0A3385701E Received: from mail-qt1-f198.google.com (mail-qt1-f198.google.com [209.85.160.198]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-300-Zo_B4eqjMeWXBD20ihSmBw-1; Wed, 28 Apr 2021 09:00:46 -0400 X-MC-Unique: Zo_B4eqjMeWXBD20ihSmBw-1 Received: by mail-qt1-f198.google.com with SMTP id j3-20020ac874c30000b02901bab5879d6aso6053964qtr.0 for ; Wed, 28 Apr 2021 06:00:46 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=kOni6XboZkv7dVyJ78dVUb3Hat1aKTRu4KXbrySL04Y=; b=q4vHZXybYinUFVluHSSQ9CR4uOspZ3lQ25fFIrH22mbbDr5EPlp2LHDBKe7k6KggFU /BSLIg+x6nLmn9YjWEIHTuSiSVeIdq6aL3V7qVhSmCWTWrn3PoR0sjZM46XDlZZTX4vl ssETFfjXyvFsR9M73vrKqSjFTf9g7OYJeqB3E5BkgpVlty9e+Gx01/vbmRnCz8bR5lYv opC2hMsKoleIC+VM2YQFGCv0LIb4PHTWKhQzG2MUJlD8761nzDUrV8mONF3BTXUE/FZ3 Wc+wNcL1SdUmRKjHg1tinE3guKmKSdVjDUD6DC4oxRydpRmtgEZT6T8Zepgjh3TuftVI i1jA== X-Gm-Message-State: AOAM532vEz63Fhg8Rzmc5mKbLeEQ3YojGLtmuAyX9Otex/UUrOtw7ppM WYXezMyPu5D9rtY9UNtMKIumBwQHw0C9q9WZ7gNbHI8P1CQS3/81mT5tfQdTntKDZmeIgz6J6N6 tdvNcyi8Ze6q2KcGVHGVAOTwwATIjt5e9Q9jC8U4gY6ph74exxX5vXUw2/v+Ws3F2sHF4BA== X-Received: by 2002:a0c:fb4e:: with SMTP id b14mr28935556qvq.28.1619614844167; Wed, 28 Apr 2021 06:00:44 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxz31OpbYRkn52UflvCbibs48dn5jXq01I5t1v3ca/wllU99Sub0FF8rRctdX8ok2m6H8bxJg== X-Received: by 2002:a0c:fb4e:: with SMTP id b14mr28935525qvq.28.1619614843831; Wed, 28 Apr 2021 06:00:43 -0700 (PDT) Received: from athas.redhat.com (198-84-214-74.cpe.teksavvy.com. [198.84.214.74]) by smtp.gmail.com with ESMTPSA id g18sm4903451qke.21.2021.04.28.06.00.42 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 28 Apr 2021 06:00:43 -0700 (PDT) To: libc-alpha@sourceware.org, fweimer@redhat.com Subject: [PATCH v4 0/4] Add new C.UTF-8 locale (Bug 17318) Date: Wed, 28 Apr 2021 09:00:29 -0400 Message-Id: <20210428130033.3196848-1-carlos@redhat.com> X-Mailer: git-send-email 2.26.3 MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com X-Spam-Status: No, score=-5.6 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, RCVD_IN_DNSWL_LOW, RCVD_IN_MSPIKE_H4, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Carlos O'Donell via Libc-alpha From: Carlos O'Donell Reply-To: Carlos O'Donell Errors-To: libc-alpha-bounces@sourceware.org Sender: "Libc-alpha" In order to make implementing the C.UTF-8 locale easier there are several steps that should be taken before the locale is added: 1) Implement wide ellipsis range handling for UTF-8 to simplify the LC_COLLATE description in the locale. 2) Update the UTF-8 charmap processing to include all code points (excluding surrogates) and make use of the wide ellipsis ranges. 4) Regenerate the UTF-8 character map with the new characters for full code point coverage. The new C.UTF-8 locale is not added to SUPPORTED because it is 28MiB in size due to the size of the weights array in LC_COLLATE for the full set of code points. Before we can make C.UTF-8 supported we must simplify the weights processing to use strcmp and remove the weights array from the binary data. To some extent this is a reference implementation from which we can test a newer version or a builtin version that has the size and performance we expect. Carlos O'Donell (4): Add support for processing wide ellipsis ranges in UTF-8. Update UTF-8 charmap processing. Regenerate localedata files. Add generic C.UTF-8 locale (Bug 17318) locale/programs/charmap.c | 174 +- localedata/C.UTF-8.in | 156 + localedata/Makefile | 2 + localedata/charmaps/UTF-8 | 4396 ++++-------------------- localedata/locales/C | 188 + localedata/locales/i18n_ctype | 2 +- localedata/locales/tr_TR | 2 +- localedata/locales/translit_circle | 2 +- localedata/locales/translit_cjk_compat | 2 +- localedata/locales/translit_combining | 2 +- localedata/locales/translit_compat | 2 +- localedata/locales/translit_font | 2 +- localedata/locales/translit_fraction | 2 +- localedata/unicode-gen/utf8_gen.py | 133 +- 14 files changed, 1288 insertions(+), 3777 deletions(-) create mode 100644 localedata/C.UTF-8.in create mode 100644 localedata/locales/C