[v3] ld/PE: Handle MS style import libraries for files named *.exe too

Message ID 20241119132727.3218276-1-martin@martin.st
State New
Headers
Series [v3] ld/PE: Handle MS style import libraries for files named *.exe too |

Checks

Context Check Description
linaro-tcwg-bot/tcwg_binutils_build--master-arm success Build passed
linaro-tcwg-bot/tcwg_binutils_build--master-aarch64 success Build passed
linaro-tcwg-bot/tcwg_binutils_check--master-aarch64 success Test passed
linaro-tcwg-bot/tcwg_binutils_check--master-arm success Test passed

Commit Message

Martin Storsjö Nov. 19, 2024, 1:27 p.m. UTC
  When handling MS style import libraries (also called short import
libraries, or ILF), we need to detect the kind of library.

So far, this has been done by looking at the member file names
in the import library - in an MS style import library, all the
member files for a specific library have the same member file
name - the name of the runtime module to link against. Usually
this is a DLL - thus we do a case insensitive comparison and
check if the suffix is .dll.

However, an .exe can also export symbols which can be linked
against in the same way. In particular, if linking against
WDK (Windows Driver Kit) import libraries, e.g. wdmsec.lib, the
import libraries can provide imports for ntoskrnl.exe.

Instead of specifically checking for *.dll (and *.exe, etc),
invert the condition and skip archive members named *.o and *.obj.
For any remaining archive members, that do contain .idata
sections, apply the renaming. (The renaming is also mostly
harmless if applied where it isn't needed; if archive members
already have unique file names, their relative ordering should
remain intact except for very contrieved cases.)
---
v3: Updated based on the earlier discussion. Also tested the
effect of applying renaming on GNU style import libraries where
no renaming is needed - where it is quite harmless.
---
 ld/emultempl/pe.em  | 29 ++++++++++++++++++++++-------
 ld/emultempl/pep.em | 29 ++++++++++++++++++++++-------
 2 files changed, 44 insertions(+), 14 deletions(-)
  

Comments

Jan Beulich Nov. 21, 2024, 12:14 p.m. UTC | #1
On 19.11.2024 14:27, Martin Storsjö wrote:
> When handling MS style import libraries (also called short import
> libraries, or ILF), we need to detect the kind of library.
> 
> So far, this has been done by looking at the member file names
> in the import library - in an MS style import library, all the
> member files for a specific library have the same member file
> name - the name of the runtime module to link against. Usually
> this is a DLL - thus we do a case insensitive comparison and
> check if the suffix is .dll.
> 
> However, an .exe can also export symbols which can be linked
> against in the same way. In particular, if linking against
> WDK (Windows Driver Kit) import libraries, e.g. wdmsec.lib, the
> import libraries can provide imports for ntoskrnl.exe.
> 
> Instead of specifically checking for *.dll (and *.exe, etc),
> invert the condition and skip archive members named *.o and *.obj.
> For any remaining archive members, that do contain .idata
> sections, apply the renaming. (The renaming is also mostly
> harmless if applied where it isn't needed; if archive members
> already have unique file names, their relative ordering should
> remain intact except for very contrieved cases.)

Okay. Just one nit on comment text:

> --- a/ld/emultempl/pe.em
> +++ b/ld/emultempl/pe.em
> @@ -1747,13 +1747,28 @@ gld${EMULATION_NAME}_after_open (void)
>  
>  	    /* Microsoft import libraries may contain archive members for
>  	       one or more DLLs, together with static object files.
> -	       Inspect all members that are named *.dll - check whether
> -	       they contain .idata sections. Do the renaming of all
> -	       archive members that seem to be Microsoft style import
> -	       objects.  */
> +	       The head and sentinels are regular COFF object files,
> +	       while the thunks are special ILF files that get synthesized
> +	       by bfd into COFF object files.
> +
> +	       As Microsoft import libraries can be for a module with
> +	       almost any file name (*.dll, *.exe, etc), we can't easily
> +	       know which archive members to inspect.
> +
> +	       Inspect all members unless ones named *.o or *.obj (which

While I'm not a native speaker, I think this wants to be either "Inspect
all members, except ones ..." or "Inspect all members unless named ...".

Jan
  
Martin Storsjö Nov. 21, 2024, 12:21 p.m. UTC | #2
On Thu, 21 Nov 2024, Jan Beulich wrote:

> On 19.11.2024 14:27, Martin Storsjö wrote:
>> When handling MS style import libraries (also called short import
>> libraries, or ILF), we need to detect the kind of library.
>>
>> So far, this has been done by looking at the member file names
>> in the import library - in an MS style import library, all the
>> member files for a specific library have the same member file
>> name - the name of the runtime module to link against. Usually
>> this is a DLL - thus we do a case insensitive comparison and
>> check if the suffix is .dll.
>>
>> However, an .exe can also export symbols which can be linked
>> against in the same way. In particular, if linking against
>> WDK (Windows Driver Kit) import libraries, e.g. wdmsec.lib, the
>> import libraries can provide imports for ntoskrnl.exe.
>>
>> Instead of specifically checking for *.dll (and *.exe, etc),
>> invert the condition and skip archive members named *.o and *.obj.
>> For any remaining archive members, that do contain .idata
>> sections, apply the renaming. (The renaming is also mostly
>> harmless if applied where it isn't needed; if archive members
>> already have unique file names, their relative ordering should
>> remain intact except for very contrieved cases.)
>
> Okay. Just one nit on comment text:
>
>> --- a/ld/emultempl/pe.em
>> +++ b/ld/emultempl/pe.em
>> @@ -1747,13 +1747,28 @@ gld${EMULATION_NAME}_after_open (void)
>>
>>  	    /* Microsoft import libraries may contain archive members for
>>  	       one or more DLLs, together with static object files.
>> -	       Inspect all members that are named *.dll - check whether
>> -	       they contain .idata sections. Do the renaming of all
>> -	       archive members that seem to be Microsoft style import
>> -	       objects.  */
>> +	       The head and sentinels are regular COFF object files,
>> +	       while the thunks are special ILF files that get synthesized
>> +	       by bfd into COFF object files.
>> +
>> +	       As Microsoft import libraries can be for a module with
>> +	       almost any file name (*.dll, *.exe, etc), we can't easily
>> +	       know which archive members to inspect.
>> +
>> +	       Inspect all members unless ones named *.o or *.obj (which
>
> While I'm not a native speaker, I think this wants to be either "Inspect
> all members, except ones ..." or "Inspect all members unless named ...".

Oh, indeed, yes.

Ok to push with that changed?

// Martin
  
Jan Beulich Nov. 21, 2024, 12:32 p.m. UTC | #3
On 21.11.2024 13:21, Martin Storsjö wrote:
> On Thu, 21 Nov 2024, Jan Beulich wrote:
> 
>> On 19.11.2024 14:27, Martin Storsjö wrote:
>>> When handling MS style import libraries (also called short import
>>> libraries, or ILF), we need to detect the kind of library.
>>>
>>> So far, this has been done by looking at the member file names
>>> in the import library - in an MS style import library, all the
>>> member files for a specific library have the same member file
>>> name - the name of the runtime module to link against. Usually
>>> this is a DLL - thus we do a case insensitive comparison and
>>> check if the suffix is .dll.
>>>
>>> However, an .exe can also export symbols which can be linked
>>> against in the same way. In particular, if linking against
>>> WDK (Windows Driver Kit) import libraries, e.g. wdmsec.lib, the
>>> import libraries can provide imports for ntoskrnl.exe.
>>>
>>> Instead of specifically checking for *.dll (and *.exe, etc),
>>> invert the condition and skip archive members named *.o and *.obj.
>>> For any remaining archive members, that do contain .idata
>>> sections, apply the renaming. (The renaming is also mostly
>>> harmless if applied where it isn't needed; if archive members
>>> already have unique file names, their relative ordering should
>>> remain intact except for very contrieved cases.)
>>
>> Okay. Just one nit on comment text:
>>
>>> --- a/ld/emultempl/pe.em
>>> +++ b/ld/emultempl/pe.em
>>> @@ -1747,13 +1747,28 @@ gld${EMULATION_NAME}_after_open (void)
>>>
>>>  	    /* Microsoft import libraries may contain archive members for
>>>  	       one or more DLLs, together with static object files.
>>> -	       Inspect all members that are named *.dll - check whether
>>> -	       they contain .idata sections. Do the renaming of all
>>> -	       archive members that seem to be Microsoft style import
>>> -	       objects.  */
>>> +	       The head and sentinels are regular COFF object files,
>>> +	       while the thunks are special ILF files that get synthesized
>>> +	       by bfd into COFF object files.
>>> +
>>> +	       As Microsoft import libraries can be for a module with
>>> +	       almost any file name (*.dll, *.exe, etc), we can't easily
>>> +	       know which archive members to inspect.
>>> +
>>> +	       Inspect all members unless ones named *.o or *.obj (which
>>
>> While I'm not a native speaker, I think this wants to be either "Inspect
>> all members, except ones ..." or "Inspect all members unless named ...".
> 
> Oh, indeed, yes.
> 
> Ok to push with that changed?

Yes.

Jan
  
Martin Storsjö Nov. 22, 2024, 12:11 p.m. UTC | #4
On Thu, 21 Nov 2024, Jan Beulich wrote:

> On 21.11.2024 13:21, Martin Storsjö wrote:
>> On Thu, 21 Nov 2024, Jan Beulich wrote:
>>
>>> On 19.11.2024 14:27, Martin Storsjö wrote:
>>>> When handling MS style import libraries (also called short import
>>>> libraries, or ILF), we need to detect the kind of library.
>>>>
>>>> So far, this has been done by looking at the member file names
>>>> in the import library - in an MS style import library, all the
>>>> member files for a specific library have the same member file
>>>> name - the name of the runtime module to link against. Usually
>>>> this is a DLL - thus we do a case insensitive comparison and
>>>> check if the suffix is .dll.
>>>>
>>>> However, an .exe can also export symbols which can be linked
>>>> against in the same way. In particular, if linking against
>>>> WDK (Windows Driver Kit) import libraries, e.g. wdmsec.lib, the
>>>> import libraries can provide imports for ntoskrnl.exe.
>>>>
>>>> Instead of specifically checking for *.dll (and *.exe, etc),
>>>> invert the condition and skip archive members named *.o and *.obj.
>>>> For any remaining archive members, that do contain .idata
>>>> sections, apply the renaming. (The renaming is also mostly
>>>> harmless if applied where it isn't needed; if archive members
>>>> already have unique file names, their relative ordering should
>>>> remain intact except for very contrieved cases.)
>>>
>>> Okay. Just one nit on comment text:
>>>
>>>> --- a/ld/emultempl/pe.em
>>>> +++ b/ld/emultempl/pe.em
>>>> @@ -1747,13 +1747,28 @@ gld${EMULATION_NAME}_after_open (void)
>>>>
>>>>  	    /* Microsoft import libraries may contain archive members for
>>>>  	       one or more DLLs, together with static object files.
>>>> -	       Inspect all members that are named *.dll - check whether
>>>> -	       they contain .idata sections. Do the renaming of all
>>>> -	       archive members that seem to be Microsoft style import
>>>> -	       objects.  */
>>>> +	       The head and sentinels are regular COFF object files,
>>>> +	       while the thunks are special ILF files that get synthesized
>>>> +	       by bfd into COFF object files.
>>>> +
>>>> +	       As Microsoft import libraries can be for a module with
>>>> +	       almost any file name (*.dll, *.exe, etc), we can't easily
>>>> +	       know which archive members to inspect.
>>>> +
>>>> +	       Inspect all members unless ones named *.o or *.obj (which
>>>
>>> While I'm not a native speaker, I think this wants to be either "Inspect
>>> all members, except ones ..." or "Inspect all members unless named ...".
>>
>> Oh, indeed, yes.
>>
>> Ok to push with that changed?
>
> Yes.

Thanks, this is pushed now.

// Martin
  

Patch

diff --git a/ld/emultempl/pe.em b/ld/emultempl/pe.em
index 869ffd3f623..f37807cb95a 100644
--- a/ld/emultempl/pe.em
+++ b/ld/emultempl/pe.em
@@ -1747,13 +1747,28 @@  gld${EMULATION_NAME}_after_open (void)
 
 	    /* Microsoft import libraries may contain archive members for
 	       one or more DLLs, together with static object files.
-	       Inspect all members that are named *.dll - check whether
-	       they contain .idata sections. Do the renaming of all
-	       archive members that seem to be Microsoft style import
-	       objects.  */
+	       The head and sentinels are regular COFF object files,
+	       while the thunks are special ILF files that get synthesized
+	       by bfd into COFF object files.
+
+	       As Microsoft import libraries can be for a module with
+	       almost any file name (*.dll, *.exe, etc), we can't easily
+	       know which archive members to inspect.
+
+	       Inspect all members unless ones named *.o or *.obj (which
+	       is the case both for regular static libraries or for GNU
+	       style import libraries). Archive members with file names other
+	       than *.o or *.obj, that do contain .idata sections, are
+	       considered to be Microsoft style import objects, and are
+	       renamed accordingly.
+
+	       If this heuristic is wrong and we apply this on archive members
+	       that already have unique names, it shouldn't make any difference
+	       as we only append a suffix on the names.  */
 	    pnt = strrchr (bfd_get_filename (is->the_bfd), '.');
 
-	    if (pnt != NULL && (fileext_cmp (pnt + 1, "dll") == 0))
+	    if (pnt != NULL && (fileext_cmp (pnt + 1, "o") != 0 &&
+				fileext_cmp (pnt + 1, "obj") != 0))
 	      {
 		int idata2 = 0, reloc_count = 0, idata = 0;
 		asection *sec;
@@ -1768,8 +1783,8 @@  gld${EMULATION_NAME}_after_open (void)
 		    reloc_count += sec->reloc_count;
 		  }
 
-		/* An archive member named .dll, but not having any .idata
-		   sections - apparently not a Microsoft import object
+		/* An archive member not named .o or .obj, but not having any
+		   .idata sections - apparently not a Microsoft import object
 		   after all: Skip renaming it.  */
 		if (!idata)
 		  continue;
diff --git a/ld/emultempl/pep.em b/ld/emultempl/pep.em
index c225d052fb8..35ddae07984 100644
--- a/ld/emultempl/pep.em
+++ b/ld/emultempl/pep.em
@@ -1730,13 +1730,28 @@  gld${EMULATION_NAME}_after_open (void)
 
 	    /* Microsoft import libraries may contain archive members for
 	       one or more DLLs, together with static object files.
-	       Inspect all members that are named *.dll - check whether
-	       they contain .idata sections. Do the renaming of all
-	       archive members that seem to be Microsoft style import
-	       objects.  */
+	       The head and sentinels are regular COFF object files,
+	       while the thunks are special ILF files that get synthesized
+	       by bfd into COFF object files.
+
+	       As Microsoft import libraries can be for a module with
+	       almost any file name (*.dll, *.exe, etc), we can't easily
+	       know which archive members to inspect.
+
+	       Inspect all members unless ones named *.o or *.obj (which
+	       is the case both for regular static libraries or for GNU
+	       style import libraries). Archive members with file names other
+	       than *.o or *.obj, that do contain .idata sections, are
+	       considered to be Microsoft style import objects, and are
+	       renamed accordingly.
+
+	       If this heuristic is wrong and we apply this on archive members
+	       that already have unique names, it shouldn't make any difference
+	       as we only append a suffix on the names.  */
 	    pnt = strrchr (bfd_get_filename (is->the_bfd), '.');
 
-	    if (pnt != NULL && (fileext_cmp (pnt + 1, "dll") == 0))
+	    if (pnt != NULL && (fileext_cmp (pnt + 1, "o") != 0 &&
+				fileext_cmp (pnt + 1, "obj") != 0))
 	      {
 		int idata2 = 0, reloc_count = 0, idata = 0;
 		asection *sec;
@@ -1751,8 +1766,8 @@  gld${EMULATION_NAME}_after_open (void)
 		    reloc_count += sec->reloc_count;
 		  }
 
-		/* An archive member named .dll, but not having any .idata
-		   sections - apparently not a Microsoft import object
+		/* An archive member not named .o or .obj, but not having any
+		   .idata sections - apparently not a Microsoft import object
 		   after all: Skip renaming it.  */
 		if (!idata)
 		  continue;