[1/25] Remove nested functions: crypt/md5-crypt.c

Message ID CAGQ9bdw=-pzbi00gn5t_W8pXBjU0edFvLGMNUc+o=g9og=f9+Q@mail.gmail.com
State Committed
Headers

Commit Message

Kostya Serebryany May 20, 2014, 2:23 p.m. UTC
  2014-05-20  Kostya Serebryany  <konstantin.s.serebryany@gmail.com>

        * crypt/md5-crypt.c (__md5_crypt_r): Remove a nested function.
        (b64_from_24bit): New function.

On Tue, May 20, 2014 at 5:13 PM, Siddhesh Poyarekar <siddhesh@redhat.com> wrote:
> On Tue, May 20, 2014 at 04:56:27PM +0400, Konstantin Serebryany wrote:
>> Hi,
>>
>> This patch is the first in the series of patches that remove nested
>> functions from glibc.
>> Rationale: nested functions is a non-standard language feature;
>> removing nested functions
>> will allow to compile glibc with compilers other than GCC and thus
>> benefit from other compilers
>> and code analysis tools.
>> Discussed previously here:
>> https://sourceware.org/ml/libc-alpha/2014-05/msg00400.html
>> No functionality change intended.
>>
>> (If approved, this will be my first commit to glibc, so some more
>> actions may be required)
>
> I believe initially you'll need one of us to commit for you, which we
> will do once the patch is found suitable.  You may want to review the
> contribution checklist as well:
>
> http://sourceware.org/glibc/wiki/Contribution%20checklist
>
>>
>> 2014-05-20 Kostya Serebryany <konstantin.s.serebryany@gmail.com>
>
> Two spaces between the date and name, and name and email.  An extra
> newline between the ChangeLog content and header.
>
>>         * crypt/md5-crypt.c (__md5_crypt_r): Remove a nested function.
>>         (b64_from_24bit): New function.
>>
>>
>> diff --git a/crypt/md5-crypt.c b/crypt/md5-crypt.c
>> index d1b92d7..29f66ce 100644
>> --- a/crypt/md5-crypt.c
>> +++ b/crypt/md5-crypt.c
>> @@ -88,6 +88,19 @@ extern char *__md5_crypt_r (const char *key, const
>> char *salt,
>>                             char *buffer, int buflen);
>>  extern char *__md5_crypt (const char *key, const char *salt);
>>
>> +static void b64_from_24bit (char **cp, int *buflen,
>
> 'static void' should be on a separate line on its own.

done

>
>> +                            unsigned int b2, unsigned int b1, unsigned int b0,
>> +                            int n)
>> +{
>> +  unsigned int w = (b2 << 16) | (b1 << 8) | b0;
>> +  while (n-- > 0 && *buflen > 0)
>> +    {
>> +      *(*cp)++ = b64t[w & 0x3f];
>> +      --*buflen;
>> +      w >>= 6;
>> +    }
>> +}
>> +
>>
>>  /* This entry point is equivalent to the `crypt' function in Unix
>>     libcs.  */
>> @@ -268,24 +281,18 @@ __md5_crypt_r (key, salt, buffer, buflen)
>>        --buflen;
>>      }
>>
>> -  void b64_from_24bit (unsigned int b2, unsigned int b1, unsigned int b0,
>> -                      int n)
>> -  {
>> -    unsigned int w = (b2 << 16) | (b1 << 8) | b0;
>> -    while (n-- > 0 && buflen > 0)
>> -      {
>> -       *cp++ = b64t[w & 0x3f];
>> -       --buflen;
>> -       w >>= 6;
>> -      }
>> -  }
>> -
>> -  b64_from_24bit (alt_result[0], alt_result[6], alt_result[12], 4);
>> -  b64_from_24bit (alt_result[1], alt_result[7], alt_result[13], 4);
>> -  b64_from_24bit (alt_result[2], alt_result[8], alt_result[14], 4);
>> -  b64_from_24bit (alt_result[3], alt_result[9], alt_result[15], 4);
>> -  b64_from_24bit (alt_result[4], alt_result[10], alt_result[5], 4);
>> -  b64_from_24bit (0, 0, alt_result[11], 2);
>> +  b64_from_24bit (&cp, &buflen,
>> +                  alt_result[0], alt_result[6], alt_result[12], 4);
>> +  b64_from_24bit (&cp, &buflen,
>> +                  alt_result[1], alt_result[7], alt_result[13], 4);
>> +  b64_from_24bit (&cp, &buflen,
>> +                  alt_result[2], alt_result[8], alt_result[14], 4);
>> +  b64_from_24bit (&cp, &buflen,
>> +                  alt_result[3], alt_result[9], alt_result[15], 4);
>> +  b64_from_24bit (&cp, &buflen,
>> +                  alt_result[4], alt_result[10], alt_result[5], 4);
>> +  b64_from_24bit (&cp, &buflen,
>> +                  0, 0, alt_result[11], 2);
>>    if (buflen <= 0)
>>      {
>>        __set_errno (ERANGE);
>
> It would be useful if you could post some kind of performance analysis
> of the crypt or crypt_r to show that making this change does not cause
> a significant drop in performance.  In fact, I'd be really interested
> in having a microbenchmark added for this function in benchtests.
> Instructions in benchtests/README should be useful but if not, please
> feel free to reach out to me for help.

Do you think such a benchmarks is possible at all?
I've made a simple test (attached) and profiled it with "perf":
    85.26%    a.out  libcrypt-2.15.so   [.] _ufc_doit_r
     6.13%    a.out  libcrypt-2.15.so   [.] _ufc_mk_keytab_r
     2.40%    a.out  libcrypt-2.15.so   [.] _ufc_setup_salt_r
     1.56%    a.out  libcrypt-2.15.so   [.] __crypt_r
     1.51%    a.out  libcrypt-2.15.so   [.] _ufc_output_conversion_r
     1.35%    a.out  libcrypt-2.15.so   [.] crypt

As you can see, crypt_r itself takes a tiny fraction of time,
most of it is spent in _ufc_* which are defined in another module.
Any changes in crypt itself (unless you do something insane) will not
be observable in profile.

Updated patch:

    libcs.  */
@@ -268,24 +282,18 @@ __md5_crypt_r (key, salt, buffer, buflen)
       --buflen;
     }

-  void b64_from_24bit (unsigned int b2, unsigned int b1, unsigned int b0,
-                      int n)
-  {
-    unsigned int w = (b2 << 16) | (b1 << 8) | b0;
-    while (n-- > 0 && buflen > 0)
-      {
-       *cp++ = b64t[w & 0x3f];
-       --buflen;
-       w >>= 6;
-      }
-  }
-
-  b64_from_24bit (alt_result[0], alt_result[6], alt_result[12], 4);
-  b64_from_24bit (alt_result[1], alt_result[7], alt_result[13], 4);
-  b64_from_24bit (alt_result[2], alt_result[8], alt_result[14], 4);
-  b64_from_24bit (alt_result[3], alt_result[9], alt_result[15], 4);
-  b64_from_24bit (alt_result[4], alt_result[10], alt_result[5], 4);
-  b64_from_24bit (0, 0, alt_result[11], 2);
+  b64_from_24bit (&cp, &buflen,
+                  alt_result[0], alt_result[6], alt_result[12], 4);
+  b64_from_24bit (&cp, &buflen,
+                  alt_result[1], alt_result[7], alt_result[13], 4);
+  b64_from_24bit (&cp, &buflen,
+                  alt_result[2], alt_result[8], alt_result[14], 4);
+  b64_from_24bit (&cp, &buflen,
+                  alt_result[3], alt_result[9], alt_result[15], 4);
+  b64_from_24bit (&cp, &buflen,
+                  alt_result[4], alt_result[10], alt_result[5], 4);
+  b64_from_24bit (&cp, &buflen,
+                  0, 0, alt_result[11], 2);
   if (buflen <= 0)
     {
       __set_errno (ERANGE);
  

Comments

Siddhesh Poyarekar May 20, 2014, 2:46 p.m. UTC | #1
On Tue, May 20, 2014 at 06:23:40PM +0400, Konstantin Serebryany wrote:
> Do you think such a benchmarks is possible at all?
> I've made a simple test (attached) and profiled it with "perf":
>     85.26%    a.out  libcrypt-2.15.so   [.] _ufc_doit_r
>      6.13%    a.out  libcrypt-2.15.so   [.] _ufc_mk_keytab_r
>      2.40%    a.out  libcrypt-2.15.so   [.] _ufc_setup_salt_r
>      1.56%    a.out  libcrypt-2.15.so   [.] __crypt_r
>      1.51%    a.out  libcrypt-2.15.so   [.] _ufc_output_conversion_r
>      1.35%    a.out  libcrypt-2.15.so   [.] crypt
> 
> As you can see, crypt_r itself takes a tiny fraction of time,
> most of it is spent in _ufc_* which are defined in another module.
> Any changes in crypt itself (unless you do something insane) will not
> be observable in profile.

You'll need to choose inputs so that __md5_crypt_r is called -
crypt/crypt-entry.c should help you with that.  if __md5_crypt_r takes
a tiny fraction of time again, then this change should be safe, but I
would prefer that the inputs get fed into the benchtests anyway so
that we can automatically track performance of crypt for those inputs
in future.

Siddhesh
  
Joseph Myers May 20, 2014, 2:54 p.m. UTC | #2
On Tue, 20 May 2014, Siddhesh Poyarekar wrote:

> On Tue, May 20, 2014 at 06:23:40PM +0400, Konstantin Serebryany wrote:
> > Do you think such a benchmarks is possible at all?
> > I've made a simple test (attached) and profiled it with "perf":
> >     85.26%    a.out  libcrypt-2.15.so   [.] _ufc_doit_r
> >      6.13%    a.out  libcrypt-2.15.so   [.] _ufc_mk_keytab_r
> >      2.40%    a.out  libcrypt-2.15.so   [.] _ufc_setup_salt_r
> >      1.56%    a.out  libcrypt-2.15.so   [.] __crypt_r
> >      1.51%    a.out  libcrypt-2.15.so   [.] _ufc_output_conversion_r
> >      1.35%    a.out  libcrypt-2.15.so   [.] crypt
> > 
> > As you can see, crypt_r itself takes a tiny fraction of time,
> > most of it is spent in _ufc_* which are defined in another module.
> > Any changes in crypt itself (unless you do something insane) will not
> > be observable in profile.
> 
> You'll need to choose inputs so that __md5_crypt_r is called -
> crypt/crypt-entry.c should help you with that.  if __md5_crypt_r takes
> a tiny fraction of time again, then this change should be safe, but I
> would prefer that the inputs get fed into the benchtests anyway so
> that we can automatically track performance of crypt for those inputs
> in future.

Another thing that can be done is to look at the code generated for the 
relevant file before and after the patch - I'd expect very little change.
  
Ondrej Bilka May 20, 2014, 3:20 p.m. UTC | #3
On Tue, May 20, 2014 at 02:54:41PM +0000, Joseph S. Myers wrote:
> On Tue, 20 May 2014, Siddhesh Poyarekar wrote:
> 
> > On Tue, May 20, 2014 at 06:23:40PM +0400, Konstantin Serebryany wrote:
> > > Do you think such a benchmarks is possible at all?
> > > I've made a simple test (attached) and profiled it with "perf":
> > >     85.26%    a.out  libcrypt-2.15.so   [.] _ufc_doit_r
> > >      6.13%    a.out  libcrypt-2.15.so   [.] _ufc_mk_keytab_r
> > >      2.40%    a.out  libcrypt-2.15.so   [.] _ufc_setup_salt_r
> > >      1.56%    a.out  libcrypt-2.15.so   [.] __crypt_r
> > >      1.51%    a.out  libcrypt-2.15.so   [.] _ufc_output_conversion_r
> > >      1.35%    a.out  libcrypt-2.15.so   [.] crypt
> > > 
> > > As you can see, crypt_r itself takes a tiny fraction of time,
> > > most of it is spent in _ufc_* which are defined in another module.
> > > Any changes in crypt itself (unless you do something insane) will not
> > > be observable in profile.
> > 
> > You'll need to choose inputs so that __md5_crypt_r is called -
> > crypt/crypt-entry.c should help you with that.  if __md5_crypt_r takes
> > a tiny fraction of time again, then this change should be safe, but I
> > would prefer that the inputs get fed into the benchtests anyway so
> > that we can automatically track performance of crypt for those inputs
> > in future.
> 
> Another thing that can be done is to look at the code generated for the 
> relevant file before and after the patch - I'd expect very little change.
> 
Yes I wanted to write that benchmarking this change is probably
pointless as gcc should optimize them in same way (if there is
consistently difference in one way we could report that as gcc regression.)

Also perf data show that its a cold function so you should improve
readability not performance (measuring difference is possible but you
need to run that for few days and wait until standard deviation drops
sufficiently to get statistically significant result). 

About only way this could make difference is if when it gets inlined in
one way and does not in other.
That is out of scope of microbenchmarks, instruction cache misses easily
could make a function slower in wild but you would measure that inlined
function is faster when you would call it in tight loop.

As actual change is concerned
  
Kostya Serebryany May 20, 2014, 3:25 p.m. UTC | #4
On Tue, May 20, 2014 at 6:46 PM, Siddhesh Poyarekar <siddhesh@redhat.com> wrote:
> On Tue, May 20, 2014 at 06:23:40PM +0400, Konstantin Serebryany wrote:
>> Do you think such a benchmarks is possible at all?
>> I've made a simple test (attached) and profiled it with "perf":
>>     85.26%    a.out  libcrypt-2.15.so   [.] _ufc_doit_r
>>      6.13%    a.out  libcrypt-2.15.so   [.] _ufc_mk_keytab_r
>>      2.40%    a.out  libcrypt-2.15.so   [.] _ufc_setup_salt_r
>>      1.56%    a.out  libcrypt-2.15.so   [.] __crypt_r
>>      1.51%    a.out  libcrypt-2.15.so   [.] _ufc_output_conversion_r
>>      1.35%    a.out  libcrypt-2.15.so   [.] crypt
>>
>> As you can see, crypt_r itself takes a tiny fraction of time,
>> most of it is spent in _ufc_* which are defined in another module.
>> Any changes in crypt itself (unless you do something insane) will not
>> be observable in profile.
>
> You'll need to choose inputs so that __md5_crypt_r is called -
> crypt/crypt-entry.c should help you with that.

With salt == "$1$" I get this profile:
 84.70%  a.out  libcrypt-2.15.so   [.] __md5_process_block
  5.98%  a.out  libc-2.15.so       [.] __memcpy_ssse3_back
  4.71%  a.out  libcrypt-2.15.so   [.] __md5_process_bytes
  1.92%  a.out  libcrypt-2.15.so   [.] __md5_crypt_r
  1.62%  a.out  libcrypt-2.15.so   [.] __md5_finish_ctx

Again, __md5_crypt_r is within noise.

>> Another thing that can be done is to look at the code generated for the
>> relevant file before and after the patch - I'd expect very little change.

I like this method much more, but there is actually a change.
The original binary has calls to b64_from_24bit.7858 (the nested function),
while in the binary built with my patch these calls are inlined.
Of course, this is a feature of a particular version of GCC, not of
the glibc code.


>   if __md5_crypt_r takes
> a tiny fraction of time again, then this change should be safe, but I
> would prefer that the inputs get fed into the benchtests anyway so
> that we can automatically track performance of crypt for those inputs
> in future.
>
> Siddhesh
  
Siddhesh Poyarekar May 20, 2014, 3:27 p.m. UTC | #5
On Tue, May 20, 2014 at 05:20:31PM +0200, Ondřej Bílka wrote:
> Yes I wanted to write that benchmarking this change is probably
> pointless as gcc should optimize them in same way (if there is
> consistently difference in one way we could report that as gcc regression.)
> 
> Also perf data show that its a cold function so you should improve
> readability not performance (measuring difference is possible but you
> need to run that for few days and wait until standard deviation drops
> sufficiently to get statistically significant result). 
> 
> About only way this could make difference is if when it gets inlined in
> one way and does not in other.
> That is out of scope of microbenchmarks, instruction cache misses easily
> could make a function slower in wild but you would measure that inlined
> function is faster when you would call it in tight loop.

Fair enough.  I'll push the patch if Konstantin shows that the
generated code is not drastically different.

Siddhesh
  
Kostya Serebryany May 20, 2014, 3:45 p.m. UTC | #6
On Tue, May 20, 2014 at 7:27 PM, Siddhesh Poyarekar <siddhesh@redhat.com> wrote:
> On Tue, May 20, 2014 at 05:20:31PM +0200, Ondřej Bílka wrote:
>> Yes I wanted to write that benchmarking this change is probably
>> pointless as gcc should optimize them in same way (if there is
>> consistently difference in one way we could report that as gcc regression.)
>>
>> Also perf data show that its a cold function so you should improve
>> readability not performance (measuring difference is possible but you
>> need to run that for few days and wait until standard deviation drops
>> sufficiently to get statistically significant result).
>>
>> About only way this could make difference is if when it gets inlined in
>> one way and does not in other.
>> That is out of scope of microbenchmarks, instruction cache misses easily
>> could make a function slower in wild but you would measure that inlined
>> function is faster when you would call it in tight loop.
>
> Fair enough.  I'll push the patch if Konstantin shows that the
> generated code is not drastically different.

Just a data point; I am not sure if it proves anything.
If I add __attribute__ ((noinline)) to the definition of b64_from_24bit,
the two binaries become more similar.

The non-nested variant is a bit longer as it needs 6 instructions to
pass the parameters:

    15a9:       0f b6 8d 87 fe ff ff    movzbl -0x179(%rbp),%ecx
      15b0:       0f b6 95 81 fe ff ff    movzbl -0x17f(%rbp),%edx
      15b7:       41 b9 04 00 00 00       mov    $0x4,%r9d
      15bd:       44 0f b6 85 8d fe ff    movzbl -0x173(%rbp),%r8d
      15c4:       ff
      15c5:       48 89 de                mov    %rbx,%rsi
      15c8:       4c 89 f7                mov    %r14,%rdi
      15cb:       e8 40 fb ff ff          callq  1110 <b64_from_24bit>

The nested variant needs 5:
     15bf:       0f b6 95 7e fe ff ff    movzbl -0x182(%rbp),%edx
      15c6:       0f b6 b5 78 fe ff ff    movzbl -0x188(%rbp),%esi
      15cd:       49 89 da                mov    %rbx,%r10
      15d0:       0f b6 bd 72 fe ff ff    movzbl -0x18e(%rbp),%edi
      15d7:       b9 04 00 00 00          mov    $0x4,%ecx
      15dc:       e8 2f fb ff ff          callq  1110
<b64_from_24bit.7858>

With inlining (that happens for me only in the non-nested variant)
this is irrelevant.

>
> Siddhesh
  
Siddhesh Poyarekar May 20, 2014, 3:46 p.m. UTC | #7
On Tue, May 20, 2014 at 07:25:52PM +0400, Konstantin Serebryany wrote:
> With salt == "$1$" I get this profile:
>  84.70%  a.out  libcrypt-2.15.so   [.] __md5_process_block
>   5.98%  a.out  libc-2.15.so       [.] __memcpy_ssse3_back
>   4.71%  a.out  libcrypt-2.15.so   [.] __md5_process_bytes
>   1.92%  a.out  libcrypt-2.15.so   [.] __md5_crypt_r
>   1.62%  a.out  libcrypt-2.15.so   [.] __md5_finish_ctx
> 
> Again, __md5_crypt_r is within noise.

That's good.

> 
> >> Another thing that can be done is to look at the code generated for the
> >> relevant file before and after the patch - I'd expect very little change.
> 
> I like this method much more, but there is actually a change.
> The original binary has calls to b64_from_24bit.7858 (the nested function),
> while in the binary built with my patch these calls are inlined.
> Of course, this is a feature of a particular version of GCC, not of
> the glibc code.

Given that this does not have a very big impact, I'm inclined to
accepting this change.  I'll push it in on Thursday after noon UTC if
nobody objects.

Siddhesh
  
Kostya Serebryany May 21, 2014, 11:35 a.m. UTC | #8
On Tue, May 20, 2014 at 7:46 PM, Siddhesh Poyarekar <siddhesh@redhat.com> wrote:
> On Tue, May 20, 2014 at 07:25:52PM +0400, Konstantin Serebryany wrote:
>> With salt == "$1$" I get this profile:
>>  84.70%  a.out  libcrypt-2.15.so   [.] __md5_process_block
>>   5.98%  a.out  libc-2.15.so       [.] __memcpy_ssse3_back
>>   4.71%  a.out  libcrypt-2.15.so   [.] __md5_process_bytes
>>   1.92%  a.out  libcrypt-2.15.so   [.] __md5_crypt_r
>>   1.62%  a.out  libcrypt-2.15.so   [.] __md5_finish_ctx
>>
>> Again, __md5_crypt_r is within noise.
>
> That's good.
>
>>
>> >> Another thing that can be done is to look at the code generated for the
>> >> relevant file before and after the patch - I'd expect very little change.
>>
>> I like this method much more, but there is actually a change.
>> The original binary has calls to b64_from_24bit.7858 (the nested function),
>> while in the binary built with my patch these calls are inlined.
>> Of course, this is a feature of a particular version of GCC, not of
>> the glibc code.
>
> Given that this does not have a very big impact, I'm inclined to
> accepting this change.  I'll push it in on Thursday after noon UTC if
> nobody objects.

I realized that two other files have the same situation with the
function b64_from_24bit.
md5-crypt.c, sha256-crypt.c, and sha512-crypt.c have exactly the same
implementations of b64_from_24bit as a nested function.
One possible fix is to move those two out and make them regular static
functions, just like I did in md5-crypt.c.

Another solution is to create just one implementation of
b64_from_24bit and place it somewhere else.
For example declare it as __b64_from_24bit in crypt-private.h and
implement it in crypt_util.c.

WDYT?
  
Siddhesh Poyarekar May 21, 2014, 11:47 a.m. UTC | #9
On Wed, May 21, 2014 at 03:35:43PM +0400, Konstantin Serebryany wrote:
> I realized that two other files have the same situation with the
> function b64_from_24bit.
> md5-crypt.c, sha256-crypt.c, and sha512-crypt.c have exactly the same
> implementations of b64_from_24bit as a nested function.
> One possible fix is to move those two out and make them regular static
> functions, just like I did in md5-crypt.c.
> 
> Another solution is to create just one implementation of
> b64_from_24bit and place it somewhere else.
> For example declare it as __b64_from_24bit in crypt-private.h and
> implement it in crypt_util.c.
> 
> WDYT?

One implementation would obviously be better.

Siddhesh
  
Kostya Serebryany May 21, 2014, 11:48 a.m. UTC | #10
On Wed, May 21, 2014 at 3:47 PM, Siddhesh Poyarekar <siddhesh@redhat.com> wrote:
> On Wed, May 21, 2014 at 03:35:43PM +0400, Konstantin Serebryany wrote:
>> I realized that two other files have the same situation with the
>> function b64_from_24bit.
>> md5-crypt.c, sha256-crypt.c, and sha512-crypt.c have exactly the same
>> implementations of b64_from_24bit as a nested function.
>> One possible fix is to move those two out and make them regular static
>> functions, just like I did in md5-crypt.c.
>>
>> Another solution is to create just one implementation of
>> b64_from_24bit and place it somewhere else.
>> For example declare it as __b64_from_24bit in crypt-private.h and
>> implement it in crypt_util.c.
>>
>> WDYT?
>
> One implementation would obviously be better.
Let me prepare another patch then.
Does __b64_from_24bit declared in crypt-private.h and defined in
crypt_util.c sound good?


>
> Siddhesh
  
Siddhesh Poyarekar May 21, 2014, 11:56 a.m. UTC | #11
On Wed, May 21, 2014 at 03:48:32PM +0400, Konstantin Serebryany wrote:
> Let me prepare another patch then.

Let this patch go in as is.  Move the function out with the patch to
sha256-crypt.c or whatever you're changing next, assuming that this
patch is in.

> Does __b64_from_24bit declared in crypt-private.h and defined in
> crypt_util.c sound good?

Yes.

Siddhesh
  
Siddhesh Poyarekar May 22, 2014, 2:59 p.m. UTC | #12
On Tue, May 20, 2014 at 06:23:40PM +0400, Konstantin Serebryany wrote:
> 2014-05-20  Kostya Serebryany  <konstantin.s.serebryany@gmail.com>
> 
>         * crypt/md5-crypt.c (__md5_crypt_r): Remove a nested function.
>         (b64_from_24bit): New function.
> 

I have pushed this (and the formatting fix) now.

Siddhesh
  

Patch

diff --git a/crypt/md5-crypt.c b/crypt/md5-crypt.c
index d1b92d7..5d5fc55 100644
--- a/crypt/md5-crypt.c
+++ b/crypt/md5-crypt.c
@@ -88,6 +88,20 @@  extern char *__md5_crypt_r (const char *key, const
char *salt,
                            char *buffer, int buflen);
 extern char *__md5_crypt (const char *key, const char *salt);

+static void
+b64_from_24bit (char **cp, int *buflen,
+                unsigned int b2, unsigned int b1, unsigned int b0,
+                int n)
+{
+  unsigned int w = (b2 << 16) | (b1 << 8) | b0;
+  while (n-- > 0 && *buflen > 0)
+    {
+      *(*cp)++ = b64t[w & 0x3f];
+      --*buflen;
+      w >>= 6;
+    }
+}
+

 /* This entry point is equivalent to the `crypt' function in Unix