elf: Sort only uninitialized objects in _dl_map_object_deps()

Message ID 20200725105205.103328-1-nixiaoming@huawei.com
State Superseded
Headers
Series elf: Sort only uninitialized objects in _dl_map_object_deps() |

Commit Message

Xiaoming Ni July 25, 2020, 10:52 a.m. UTC
  TIS ELF Version 1.2
Initialization and Termination Functions:
	1. Before the initialization code for any object A is called, the
	   initialization code for any other objects that object A depends
	   on are called.
	2. The order in which the dynamic linker calls termination functions
	   is the exact reverse order of their corresponding initialization
	   functions.
	3. The dynamic linker ensures that it will not execute any
	   initialization or termination functions more than once.

According to 1 and 2: _dl_sort_maps() is used for sorting when dlopen/dlclose.
According to 3: At dlopen, only the uninitialized objects need to be sorted.

Construct 200 dynamic libraries that depend on each other cyclically.
Run the dlopen() for each dynamic library.
Before the patch is installed, it takes 214 seconds.
After patching, it takes 37 seconds.

Signed-off-by: Xiaoming Ni <nixiaoming@huawei.com>
---
 elf/dl-deps.c | 40 +++++++++++++++++++++++++++++++++++++++-
 1 file changed, 39 insertions(+), 1 deletion(-)
  

Comments

Carlos O'Donell July 25, 2020, 8:57 p.m. UTC | #1
On 7/25/20 6:52 AM, Xiaoming Ni wrote:
> TIS ELF Version 1.2
> Initialization and Termination Functions:
> 	1. Before the initialization code for any object A is called, the
> 	   initialization code for any other objects that object A depends
> 	   on are called.
> 	2. The order in which the dynamic linker calls termination functions
> 	   is the exact reverse order of their corresponding initialization
> 	   functions.
> 	3. The dynamic linker ensures that it will not execute any
> 	   initialization or termination functions more than once.
> 
> According to 1 and 2: _dl_sort_maps() is used for sorting when dlopen/dlclose.
> According to 3: At dlopen, only the uninitialized objects need to be sorted.

Thank you for posting this. It will take a while to review this, and we're
currently frozen for the release on August 3rd.

> Construct 200 dynamic libraries that depend on each other cyclically.

Once you have a cyclic dependency the order is undefined.

My opinion is that the dynamic linker should break the cycle in a deterministic
fashion.

What do you think?

> Run the dlopen() for each dynamic library.
> Before the patch is installed, it takes 214 seconds.
> After patching, it takes 37 seconds.

Is it still correct?

Do you have a test case you can add?
 
> Signed-off-by: Xiaoming Ni <nixiaoming@huawei.com>

Reviewing this will have to wait until after the release, but this patch is
interesting.

Have you looked at Chung-Ling Tang's most recent work in this area?

https://patchwork.sourceware.org/project/glibc/patch/1427b370-7400-afd0-16e8-55c1072db20e@mentor.com/
https://patchwork.sourceware.org/project/glibc/patch/5de3ab61-3dca-b400-15c6-92ff5ae80877@mentor.com/

Could you use Chung-Ling's test case constructor to write a test case?

> ---
>  elf/dl-deps.c | 40 +++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 39 insertions(+), 1 deletion(-)
> 
> diff --git a/elf/dl-deps.c b/elf/dl-deps.c
> index b5a43232a7..5df1e32156 100644
> --- a/elf/dl-deps.c
> +++ b/elf/dl-deps.c
> @@ -152,6 +152,43 @@ preload (struct list *known, unsigned int *nlist, struct link_map *map)
>    map->l_reserved = 1;
>  }
>  
> +/* TIS  ELF Version 1.2
> + * Initialization and Termination Functions:
> + * 1. Before the initialization code for any object A is called, the
> + *  initialization code for any other objects that object A depends on are called.
> + * 2. The order in which the dynamic linker calls termination functions is the
> + *  exact reverse order of their corresponding initialization functions.
> + * 3. The dynamic linker ensures that it will not execute any initialization
> + *  or termination functions more than once.
> + *
> + * According to 1 and 2, _dl_sort_maps() is used for sorting when dlopen/dlclose.
> + * According to 3, we only need to sort the uninitialized objects.
> + */
> +static void
> +_dl_sort_uninit_maps (struct link_map **maps, unsigned int nmaps)
> +{
> +  unsigned int s = 0;
> +  unsigned int e = nmaps - 1;
> +  struct link_map *tmp;
> +
> +  while (s < e)
> +    {
> +      while ((e > 0) && maps[e]->l_init_called)
> +        e--;
> +      while ((s < nmaps) && maps[s]->l_init_called == 0)
> +        s++;
> +      if (s >= e)
> +        break;
> +      tmp = maps[e];
> +      maps[e] = maps[s];
> +      maps[s] = tmp;
> +      s++;
> +      e--;
> +    }
> +
> +  _dl_sort_maps (maps, s, NULL, false);
> +}
> +
>  void
>  _dl_map_object_deps (struct link_map *map,
>  		     struct link_map **preloads, unsigned int npreloads,
> @@ -611,7 +648,8 @@ Filters not supported with LD_TRACE_PRELINKING"));
>      memcpy (l_initfini, map->l_searchlist.r_list,
>  	    nlist * sizeof (struct link_map *));
>  
> -  _dl_sort_maps (&l_initfini[1], nlist - 1, NULL, false);
> +  if (map->l_init_called == 0)
> +    _dl_sort_uninit_maps (&l_initfini[1], nlist - 1);
>  
>    /* Terminate the list of dependencies.  */
>    l_initfini[nlist] = NULL;
>
  
Chung-Lin Tang July 26, 2020, 10:41 a.m. UTC | #2
On 2020/7/26 4:57 AM, Carlos O'Donell via Libc-alpha wrote:
>> Run the dlopen() for each dynamic library.
>> Before the patch is installed, it takes 214 seconds.
>> After patching, it takes 37 seconds.
> Is it still correct?
> 
> Do you have a test case you can add?
>   
>> Signed-off-by: Xiaoming Ni<nixiaoming@huawei.com>
> Reviewing this will have to wait until after the release, but this patch is
> interesting.

This patch appears to add a linear pass to somewhat reduce the input size
of a circularly linked case, but frankly speaking, is only useful with the current
old sorting algorithm, and just to a certain degree.

The mentioned test case still takes 37 seconds with the proposed patch, while for
the new DFS-based algorithm, even without any such special case input reduction,
the sort time will probably be instantaneous.

> Have you looked at Chung-Ling Tang's most recent work in this area?
> 
> https://patchwork.sourceware.org/project/glibc/patch/1427b370-7400-afd0-16e8-55c1072db20e@mentor.com/
> https://patchwork.sourceware.org/project/glibc/patch/5de3ab61-3dca-b400-15c6-92ff5ae80877@mentor.com/

If you're trying out the #17645 sorting patch, remember to add GLIBC_TUNABLES=glibc.rtld.dynamic_sort=2
to the environment before running the test, or it will still be the old algorithm.

> Could you use Chung-Ling's test case constructor to write a test case?
> 

Yeah, a new test case like this is always nice, especially to test if the description language
is expressive enough to handle this.

Chung-Lin
  
Carlos O'Donell July 27, 2020, 12:36 a.m. UTC | #3
On 7/26/20 6:41 AM, Chung-Lin Tang wrote:
> 
> 
> On 2020/7/26 4:57 AM, Carlos O'Donell via Libc-alpha wrote:
>>> Run the dlopen() for each dynamic library.
>>> Before the patch is installed, it takes 214 seconds.
>>> After patching, it takes 37 seconds.
>> Is it still correct?
>>
>> Do you have a test case you can add?
>>  
>>> Signed-off-by: Xiaoming Ni<nixiaoming@huawei.com>
>> Reviewing this will have to wait until after the release, but this patch is
>> interesting.
> 
> This patch appears to add a linear pass to somewhat reduce the input size
> of a circularly linked case, but frankly speaking, is only useful with the current
> old sorting algorithm, and just to a certain degree.

Chung-Lin, thank you for looking over the suggested fixes.

> The mentioned test case still takes 37 seconds with the proposed patch, while for
> the new DFS-based algorithm, even without any such special case input reduction,
> the sort time will probably be instantaneous.

I expected that might be the case, but it would still be good to put
the example into a test case and verify.

>> Have you looked at Chung-Ling Tang's most recent work in this area?
>>
>> https://patchwork.sourceware.org/project/glibc/patch/1427b370-7400-afd0-16e8-55c1072db20e@mentor.com/
>> https://patchwork.sourceware.org/project/glibc/patch/5de3ab61-3dca-b400-15c6-92ff5ae80877@mentor.com/
> 
> If you're trying out the #17645 sorting patch, remember to add GLIBC_TUNABLES=glibc.rtld.dynamic_sort=2
> to the environment before running the test, or it will still be the old algorithm.
> 
>> Could you use Chung-Ling's test case constructor to write a test case?
>>
> 
> Yeah, a new test case like this is always nice, especially to test if the description language
> is expressive enough to handle this.

Agreed.
  
Xiaoming Ni July 29, 2020, 1:56 p.m. UTC | #4
On 2020/7/27 8:36, Carlos O'Donell wrote:
> On 7/26/20 6:41 AM, Chung-Lin Tang wrote:
>>
>>
>> On 2020/7/26 4:57 AM, Carlos O'Donell via Libc-alpha wrote:
>>>> Run the dlopen() for each dynamic library.
>>>> Before the patch is installed, it takes 214 seconds.
>>>> After patching, it takes 37 seconds.
>>> Is it still correct? >>>
>>> Do you have a test case you can add?

I do not have fully automated test cases locally.
Perform the following steps to manually verify the configuration:
1. A large number of SOs are automatically generated. The init and exit 
functions of an SO record logs.
2. Design different dependencies (unidirectional linked list, binary 
tree, n-ary tree, ring, network, and random dependency).
3. Re-link the SO based on the dependency relationship.
4. Generate a random array and call dlopen in the sequence recorded in 
the array.
5. Manually check whether test logs and dependencies comply with the ELF 
standard.

After manual verification, the init/exit sequence of the SO still 
complies with the ELF standard.

>>>   
>>>> Signed-off-by: Xiaoming Ni<nixiaoming@huawei.com>
>>> Reviewing this will have to wait until after the release, but this patch is
>>> interesting.
>>
>> This patch appears to add a linear pass to somewhat reduce the input size
>> of a circularly linked case, but frankly speaking, is only useful with the current
>> old sorting algorithm, and just to a certain degree.
> 
> Chung-Lin, thank you for looking over the suggested fixes.
> 

elf/dl-deps.c  _dl_map_object_deps:

-  _dl_sort_maps (&l_initfini[1], nlist - 1, NULL, false);
+  if (map->l_init_called == 0)
+    _dl_sort_maps (&l_initfini[1], nlist - 1, NULL, false);

When an object has been loaded, all its dependent objects have been 
initialized. In this case, sorting is not required.


>> The mentioned test case still takes 37 seconds with the proposed patch, while for
>> the new DFS-based algorithm, even without any such special case input reduction,
>> the sort time will probably be instantaneous.
> 
> I expected that might be the case, but it would still be good to put
> the example into a test case and verify.
> 
>>> Have you looked at Chung-Ling Tang's most recent work in this area?
>>>
>>> https://patchwork.sourceware.org/project/glibc/patch/1427b370-7400-afd0-16e8-55c1072db20e@mentor.com/
>>> https://patchwork.sourceware.org/project/glibc/patch/5de3ab61-3dca-b400-15c6-92ff5ae80877@mentor.com/
>>
I didn't notice this before.

>> If you're trying out the #17645 sorting patch, remember to add GLIBC_TUNABLES=glibc.rtld.dynamic_sort=2
>> to the environment before running the test, or it will still be the old algorithm.
>>
>>> Could you use Chung-Ling's test case constructor to write a test case?
>>>
>>
>> Yeah, a new test case like this is always nice, especially to test if the description language
>> is expressive enough to handle this.
> 
> Agreed.
> 
I'm trying to automate my test cases, but it's going to take a while.

thanks
  
Carlos O'Donell Aug. 3, 2020, 6:37 p.m. UTC | #5
On 7/29/20 9:56 AM, Xiaoming Ni wrote:
> I'm trying to automate my test cases, but it's going to take a while.

Thank you for doing this.

My reminder to everyone is that writing tests is one of the most important
things we can do as systems developers.

The systems we are building are expected to be very reliable and we need
to write tests that encode our expectations and intent.

Chung-Lin's patches provide an automated way to write such test cases for
the loader. Please try applying the patches and seeing if you can use the
special DSL to write a test that matches what you were previously testing.
  
Carlos O'Donell Sept. 13, 2022, 9:42 a.m. UTC | #6
On Mon, Aug 03, 2020 at 02:37:22PM -0400, Carlos O'Donell wrote:
> On 7/29/20 9:56 AM, Xiaoming Ni wrote:
> > I'm trying to automate my test cases, but it's going to take a while.
> 
> Thank you for doing this.
> 
> My reminder to everyone is that writing tests is one of the most important
> things we can do as systems developers.
> 
> The systems we are building are expected to be very reliable and we need
> to write tests that encode our expectations and intent.
> 
> Chung-Lin's patches provide an automated way to write such test cases for
> the loader. Please try applying the patches and seeing if you can use the
> special DSL to write a test that matches what you were previously testing.

I'm walking out patch queue backwards in patchwork.

I wanted to note here that this 2020 patch has been superseded by
future work that improves the core DSO sorting algorithm.

We have switched to the new DSO sorting algorithm by default.

Please re-check the current glibc to see if it meeds your needs
or if we need to improve things further.

Cheers,
Carlos.

> -- 
> Cheers,
> Carlos.
  

Patch

diff --git a/elf/dl-deps.c b/elf/dl-deps.c
index b5a43232a7..5df1e32156 100644
--- a/elf/dl-deps.c
+++ b/elf/dl-deps.c
@@ -152,6 +152,43 @@  preload (struct list *known, unsigned int *nlist, struct link_map *map)
   map->l_reserved = 1;
 }
 
+/* TIS  ELF Version 1.2
+ * Initialization and Termination Functions:
+ * 1. Before the initialization code for any object A is called, the
+ *  initialization code for any other objects that object A depends on are called.
+ * 2. The order in which the dynamic linker calls termination functions is the
+ *  exact reverse order of their corresponding initialization functions.
+ * 3. The dynamic linker ensures that it will not execute any initialization
+ *  or termination functions more than once.
+ *
+ * According to 1 and 2, _dl_sort_maps() is used for sorting when dlopen/dlclose.
+ * According to 3, we only need to sort the uninitialized objects.
+ */
+static void
+_dl_sort_uninit_maps (struct link_map **maps, unsigned int nmaps)
+{
+  unsigned int s = 0;
+  unsigned int e = nmaps - 1;
+  struct link_map *tmp;
+
+  while (s < e)
+    {
+      while ((e > 0) && maps[e]->l_init_called)
+        e--;
+      while ((s < nmaps) && maps[s]->l_init_called == 0)
+        s++;
+      if (s >= e)
+        break;
+      tmp = maps[e];
+      maps[e] = maps[s];
+      maps[s] = tmp;
+      s++;
+      e--;
+    }
+
+  _dl_sort_maps (maps, s, NULL, false);
+}
+
 void
 _dl_map_object_deps (struct link_map *map,
 		     struct link_map **preloads, unsigned int npreloads,
@@ -611,7 +648,8 @@  Filters not supported with LD_TRACE_PRELINKING"));
     memcpy (l_initfini, map->l_searchlist.r_list,
 	    nlist * sizeof (struct link_map *));
 
-  _dl_sort_maps (&l_initfini[1], nlist - 1, NULL, false);
+  if (map->l_init_called == 0)
+    _dl_sort_uninit_maps (&l_initfini[1], nlist - 1);
 
   /* Terminate the list of dependencies.  */
   l_initfini[nlist] = NULL;