[v5,4/5] c++modules: report imported CMI files as dependencies

Message ID 20230125210636.2960049-5-ben.boeckel@kitware.com
State New
Headers
Series P1689R5 support |

Commit Message

Ben Boeckel Jan. 25, 2023, 9:06 p.m. UTC
  They affect the build, so report them via `-MF` mechanisms.

gcc/cp/

	* module.cc (do_import): Report imported CMI files as
	dependencies.

Signed-off-by: Ben Boeckel <ben.boeckel@kitware.com>
---
 gcc/cp/module.cc | 2 ++
 1 file changed, 2 insertions(+)
  

Comments

Jason Merrill Feb. 13, 2023, 6:33 p.m. UTC | #1
On 1/25/23 13:06, Ben Boeckel wrote:
> They affect the build, so report them via `-MF` mechanisms.
> 
> gcc/cp/
> 
> 	* module.cc (do_import): Report imported CMI files as
> 	dependencies.

Both this and the mapper dependency patch seem to cause most of the 
modules testcases to crash; please remember to run the regression tests 
(https://gcc.gnu.org/contribute.html#testing)

> Signed-off-by: Ben Boeckel <ben.boeckel@kitware.com>
> ---
>   gcc/cp/module.cc | 2 ++
>   1 file changed, 2 insertions(+)
> 
> diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc
> index ebd30f63d81..dbd1b721616 100644
> --- a/gcc/cp/module.cc
> +++ b/gcc/cp/module.cc
> @@ -18966,6 +18966,8 @@ module_state::do_import (cpp_reader *reader, bool outermost)
>         dump () && dump ("CMI is %s", file);
>         if (note_module_cmi_yes || inform_cmi_p)
>   	inform (loc, "reading CMI %qs", file);
> +      /* Add the CMI file to the dependency tracking. */
> +      deps_add_dep (cpp_get_deps (reader), file);
>         fd = open (file, O_RDONLY | O_CLOEXEC | O_BINARY);
>         e = errno;
>       }
  
Ben Boeckel May 12, 2023, 2:26 p.m. UTC | #2
On Mon, Feb 13, 2023 at 13:33:50 -0500, Jason Merrill wrote:
> Both this and the mapper dependency patch seem to cause most of the 
> modules testcases to crash; please remember to run the regression tests 
> (https://gcc.gnu.org/contribute.html#testing)

Fixed for v6. `cpp_get_deps` can return `NULL` which `deps_add_dep`
assumes to not be true; fixed by checking before calling.

--Ben
  
Jason Merrill June 22, 2023, 9:21 p.m. UTC | #3
On 1/25/23 16:06, Ben Boeckel wrote:
> They affect the build, so report them via `-MF` mechanisms.

Why isn't this covered by the existing code in preprocessed_module?

> gcc/cp/
> 
> 	* module.cc (do_import): Report imported CMI files as
> 	dependencies.
> 
> Signed-off-by: Ben Boeckel <ben.boeckel@kitware.com>
> ---
>   gcc/cp/module.cc | 2 ++
>   1 file changed, 2 insertions(+)
> 
> diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc
> index ebd30f63d81..dbd1b721616 100644
> --- a/gcc/cp/module.cc
> +++ b/gcc/cp/module.cc
> @@ -18966,6 +18966,8 @@ module_state::do_import (cpp_reader *reader, bool outermost)
>         dump () && dump ("CMI is %s", file);
>         if (note_module_cmi_yes || inform_cmi_p)
>   	inform (loc, "reading CMI %qs", file);
> +      /* Add the CMI file to the dependency tracking. */
> +      deps_add_dep (cpp_get_deps (reader), file);
>         fd = open (file, O_RDONLY | O_CLOEXEC | O_BINARY);
>         e = errno;
>       }
  
Ben Boeckel June 23, 2023, 2:45 a.m. UTC | #4
On Thu, Jun 22, 2023 at 17:21:42 -0400, Jason Merrill wrote:
> On 1/25/23 16:06, Ben Boeckel wrote:
> > They affect the build, so report them via `-MF` mechanisms.
> 
> Why isn't this covered by the existing code in preprocessed_module?

It appears as though it is neutered in patch 3 where
`write_make_modules_deps` is used in `make_write` (or will use that name
in v7 once I finish up testing). This logic cannot be used for p1689
output because it assumes the location and names of CMI files (`.c++m`)
that will be necessary (it is related to the `CXX_IMPORTS +=` GNU
make/libcody extensions that will, e.g., cause `ninja` to choke if it is
read from `-MF` output as it uses "fancier" Makefile syntax than tools
that are not actually `make` are going to be willing to support). This
codepath is the *actual* filename being read at compile time and is
relevant at all times; it may duplicate what `preprocessed_module` sets
up.

I'm also realizing that this is why I need to pass `-fdeps-format=p1689`
when compiling…there may need to be another, more idiomatic, way to
disable this additional syntax usage in `-MF` output.

--Ben
  
Nathan Sidwell June 23, 2023, 12:12 p.m. UTC | #5
On 6/22/23 22:45, Ben Boeckel wrote:
> On Thu, Jun 22, 2023 at 17:21:42 -0400, Jason Merrill wrote:
>> On 1/25/23 16:06, Ben Boeckel wrote:
>>> They affect the build, so report them via `-MF` mechanisms.
>>
>> Why isn't this covered by the existing code in preprocessed_module?
> 
> It appears as though it is neutered in patch 3 where
> `write_make_modules_deps` is used in `make_write` (or will use that name

Why do you want to record the transitive modules? I would expect just noting the 
ones with imports directly in the TU would suffice (i.e check the 'outermost' arg)

nathan
  
Ben Boeckel June 25, 2023, 4:36 p.m. UTC | #6
On Fri, Jun 23, 2023 at 08:12:41 -0400, Nathan Sidwell wrote:
> On 6/22/23 22:45, Ben Boeckel wrote:
> > On Thu, Jun 22, 2023 at 17:21:42 -0400, Jason Merrill wrote:
> >> On 1/25/23 16:06, Ben Boeckel wrote:
> >>> They affect the build, so report them via `-MF` mechanisms.
> >>
> >> Why isn't this covered by the existing code in preprocessed_module?
> > 
> > It appears as though it is neutered in patch 3 where
> > `write_make_modules_deps` is used in `make_write` (or will use that name
> 
> Why do you want to record the transitive modules? I would expect just noting the 
> ones with imports directly in the TU would suffice (i.e check the 'outermost' arg)

FWIW, only GCC has "fat" modules. MSVC and Clang both require the
transitive closure to be passed. The idea there is to minimize the size
of individual module files.

If GCC only reads the "fat" modules, then only those should be recorded.
If it reads other modules, they should be recorded as well.

--Ben
  
Jason Merrill July 18, 2023, 8:52 p.m. UTC | #7
On 6/25/23 12:36, Ben Boeckel wrote:
> On Fri, Jun 23, 2023 at 08:12:41 -0400, Nathan Sidwell wrote:
>> On 6/22/23 22:45, Ben Boeckel wrote:
>>> On Thu, Jun 22, 2023 at 17:21:42 -0400, Jason Merrill wrote:
>>>> On 1/25/23 16:06, Ben Boeckel wrote:
>>>>> They affect the build, so report them via `-MF` mechanisms.
>>>>
>>>> Why isn't this covered by the existing code in preprocessed_module?
>>>
>>> It appears as though it is neutered in patch 3 where
>>> `write_make_modules_deps` is used in `make_write` (or will use that name
>>
>> Why do you want to record the transitive modules? I would expect just noting the
>> ones with imports directly in the TU would suffice (i.e check the 'outermost' arg)
> 
> FWIW, only GCC has "fat" modules. MSVC and Clang both require the
> transitive closure to be passed. The idea there is to minimize the size
> of individual module files.
> 
> If GCC only reads the "fat" modules, then only those should be recorded.
> If it reads other modules, they should be recorded as well.

But wouldn't the transitive modules be dependencies of the direct 
imports, so (re)building the direct imports would first require building 
the transitive modules anyway?  Expressing the transitive closure of 
dependencies for each importer seems redundant when it can be easily 
derived from the direct dependencies of each module.

Jason
  
Nathan Sidwell July 18, 2023, 9:12 p.m. UTC | #8
On 7/18/23 16:52, Jason Merrill wrote:
> On 6/25/23 12:36, Ben Boeckel wrote:
>> On Fri, Jun 23, 2023 at 08:12:41 -0400, Nathan Sidwell wrote:
>>> On 6/22/23 22:45, Ben Boeckel wrote:
>>>> On Thu, Jun 22, 2023 at 17:21:42 -0400, Jason Merrill wrote:
>>>>> On 1/25/23 16:06, Ben Boeckel wrote:
>>>>>> They affect the build, so report them via `-MF` mechanisms.
>>>>>
>>>>> Why isn't this covered by the existing code in preprocessed_module?
>>>>
>>>> It appears as though it is neutered in patch 3 where
>>>> `write_make_modules_deps` is used in `make_write` (or will use that name
>>>
>>> Why do you want to record the transitive modules? I would expect just noting the
>>> ones with imports directly in the TU would suffice (i.e check the 'outermost' 
>>> arg)
>>
>> FWIW, only GCC has "fat" modules. MSVC and Clang both require the
>> transitive closure to be passed. The idea there is to minimize the size
>> of individual module files.
>>
>> If GCC only reads the "fat" modules, then only those should be recorded.
>> If it reads other modules, they should be recorded as well.

Please explain what you mean by fat modules.  There seems to be confusion.

> 
> But wouldn't the transitive modules be dependencies of the direct imports, so 
> (re)building the direct imports would first require building the transitive 
> modules anyway?  Expressing the transitive closure of dependencies for each 
> importer seems redundant when it can be easily derived from the direct 
> dependencies of each module.
> 
> Jason
>
  
Ben Boeckel July 19, 2023, 12:01 a.m. UTC | #9
On Tue, Jul 18, 2023 at 16:52:44 -0400, Jason Merrill wrote:
> On 6/25/23 12:36, Ben Boeckel wrote:
> > On Fri, Jun 23, 2023 at 08:12:41 -0400, Nathan Sidwell wrote:
> >> On 6/22/23 22:45, Ben Boeckel wrote:
> >>> On Thu, Jun 22, 2023 at 17:21:42 -0400, Jason Merrill wrote:
> >>>> On 1/25/23 16:06, Ben Boeckel wrote:
> >>>>> They affect the build, so report them via `-MF` mechanisms.
> >>>>
> >>>> Why isn't this covered by the existing code in preprocessed_module?
> >>>
> >>> It appears as though it is neutered in patch 3 where
> >>> `write_make_modules_deps` is used in `make_write` (or will use that name
> >>
> >> Why do you want to record the transitive modules? I would expect just noting the
> >> ones with imports directly in the TU would suffice (i.e check the 'outermost' arg)
> > 
> > FWIW, only GCC has "fat" modules. MSVC and Clang both require the
> > transitive closure to be passed. The idea there is to minimize the size
> > of individual module files.
> > 
> > If GCC only reads the "fat" modules, then only those should be recorded.
> > If it reads other modules, they should be recorded as well.

For clarification, given:

* a.cppm
```
export module a;
```

* b.cppm
```
export module b;
import a;
```

* use.cppm
```
import b;
```

in a "fat" module setup, `use.cppm` only needs to be told about
`b.cmi` because it contains everything that an importer needs to know
about the `a` module (reachable types, re-exported bits, whatever). With
the "thin" modules, `a.cmi` must be specified when compiling `use.cppm`
to satisfy anything that may be required transitively (e.g., a return
type of some `b` symbol is from `a`). MSVC and Clang (17+) use this
model. I don't know MSVC's rationale, but Clang's is to make CMIs
relocatable by not embedding paths to dependent modules in CMIs. This
should help caching and network transfer sizes in distributed builds.
LLVM/Clang discussion:

    https://discourse.llvm.org/t/c-20-modules-should-the-bmis-contain-paths-to-their-dependent-bmis/70422
    https://github.com/llvm/llvm-project/issues/62707

Maybe I'm missing how this *actually* works in GCC as I've really only
interacted with it through the command line, but I've not needed to
mention `a.cmi` when compiling `use.cppm`. Is `a.cmi` referenced and
read through some embedded information in `b.cmi` or does `b.cmi`
include enough information to not need to read it at all? If the former,
distributed builds are going to have a problem knowing what files to
send just from the command line (I'll call this "implicit thin"). If the
latter, that is the "fat" CMI that I'm thinking of.

> But wouldn't the transitive modules be dependencies of the direct 
> imports, so (re)building the direct imports would first require building 
> the transitive modules anyway?  Expressing the transitive closure of 
> dependencies for each importer seems redundant when it can be easily 
> derived from the direct dependencies of each module.

I'm not concerned whether it is transitive or not, really. If a file is
read, it should be reported here regardless of the reason. Note that
caching mechanisms may skip actually *doing* the reading, but the
dependencies should still be reported from the cached results as-if the
real work had been performed.

--Ben
  
Nathan Sidwell July 19, 2023, 9:11 p.m. UTC | #10
On 7/18/23 20:01, Ben Boeckel wrote:
> On Tue, Jul 18, 2023 at 16:52:44 -0400, Jason Merrill wrote:
>> On 6/25/23 12:36, Ben Boeckel wrote:
>>> On Fri, Jun 23, 2023 at 08:12:41 -0400, Nathan Sidwell wrote:
>>>> On 6/22/23 22:45, Ben Boeckel wrote:
>>>>> On Thu, Jun 22, 2023 at 17:21:42 -0400, Jason Merrill wrote:
>>>>>> On 1/25/23 16:06, Ben Boeckel wrote:
>>>>>>> They affect the build, so report them via `-MF` mechanisms.
>>>>>>
>>>>>> Why isn't this covered by the existing code in preprocessed_module?
>>>>>
>>>>> It appears as though it is neutered in patch 3 where
>>>>> `write_make_modules_deps` is used in `make_write` (or will use that name
>>>>
>>>> Why do you want to record the transitive modules? I would expect just noting the
>>>> ones with imports directly in the TU would suffice (i.e check the 'outermost' arg)
>>>
>>> FWIW, only GCC has "fat" modules. MSVC and Clang both require the
>>> transitive closure to be passed. The idea there is to minimize the size
>>> of individual module files.
>>>
>>> If GCC only reads the "fat" modules, then only those should be recorded.
>>> If it reads other modules, they should be recorded as well.
> 
> For clarification, given:
> 
> * a.cppm
> ```
> export module a;
> ```
> 
> * b.cppm
> ```
> export module b;
> import a;
> ```
> 
> * use.cppm
> ```
> import b;
> ```
> 
> in a "fat" module setup, `use.cppm` only needs to be told about
> `b.cmi` because it contains everything that an importer needs to know
> about the `a` module (reachable types, re-exported bits, whateve > With
> the "thin" modules, `a.cmi` must be specified when compiling `use.cppm`
> to satisfy anything that may be required transitively (e.g., a return

GCC is neither of these descriptions.  a CMI does not contain the transitive 
closure of its imports.  It contains an import table.  That table lists the 
transitive closure of its imports (it needs that closure to do remapping), and 
that table contains the CMI pathnames of the direct imports.  Those pathnames 
are absolute, if the mapper provded an absolute pathm or relative to the CMI repo.

The rationale here is that if you're building a CMI, Foo, which imports a bunch 
of modules, those imported CMIs will have the same (relative) location in this 
compilation and in compilations importing Foo (why would you move them?) Note 
this is NOT inhibiting relocatable builds, because of the CMI repo.


> Maybe I'm missing how this *actually* works in GCC as I've really only
> interacted with it through the command line, but I've not needed to
> mention `a.cmi` when compiling `use.cppm`. Is `a.cmi` referenced and
> read through some embedded information in `b.cmi` or does `b.cmi`
> include enough information to not need to read it at all? If the former,
> distributed builds are going to have a problem knowing what files to
> send just from the command line (I'll call this "implicit thin"). If the
> latter, that is the "fat" CMI that I'm thinking of.

please don't use perjorative terms like 'fat' and 'thin'.

> 
>> But wouldn't the transitive modules be dependencies of the direct
>> imports, so (re)building the direct imports would first require building
>> the transitive modules anyway?  Expressing the transitive closure of
>> dependencies for each importer seems redundant when it can be easily
>> derived from the direct dependencies of each module.
> 
> I'm not concerned whether it is transitive or not, really. If a file is
> read, it should be reported here regardless of the reason. Note that
> caching mechanisms may skip actually *doing* the reading, but the
> dependencies should still be reported from the cached results as-if the
> real work had been performed.
> 
> --Ben
  
Ben Boeckel July 20, 2023, 12:47 a.m. UTC | #11
On Wed, Jul 19, 2023 at 17:11:08 -0400, Nathan Sidwell wrote:
> GCC is neither of these descriptions.  a CMI does not contain the transitive 
> closure of its imports.  It contains an import table.  That table lists the 
> transitive closure of its imports (it needs that closure to do remapping), and 
> that table contains the CMI pathnames of the direct imports.  Those pathnames 
> are absolute, if the mapper provded an absolute pathm or relative to the CMI repo.
> 
> The rationale here is that if you're building a CMI, Foo, which imports a bunch 
> of modules, those imported CMIs will have the same (relative) location in this 
> compilation and in compilations importing Foo (why would you move them?) Note 
> this is NOT inhibiting relocatable builds, because of the CMI repo.

But it is inhibiting distributed builds because the distributing tool
would need to know:

- what CMIs are actually imported (here, "read the module mapper file"
  (in CMake's case, this is only the modules that are needed; a single
  massive mapper file for an entire project would have extra entries) or
  "act as a proxy for the socket/program specified" for other
  approaches);
- read the CMIs as it sends to the remote side to gather any other CMIs
  that may be needed (recursively);

Contrast this with the MSVC and Clang (17+) mechanism where the command
line contains everything that is needed and a single bolus can be sent.

And relocatable is probably fine. How does it interact with reproducible
builds? Or are GCC CMIs not really something anyone should consider for
installation (even as a "here, maybe this can help consumers"
mechanism)?

> On 7/18/23 20:01, Ben Boeckel wrote:
> > Maybe I'm missing how this *actually* works in GCC as I've really only
> > interacted with it through the command line, but I've not needed to
> > mention `a.cmi` when compiling `use.cppm`. Is `a.cmi` referenced and
> > read through some embedded information in `b.cmi` or does `b.cmi`
> > include enough information to not need to read it at all? If the former,
> > distributed builds are going to have a problem knowing what files to
> > send just from the command line (I'll call this "implicit thin"). If the
> > latter, that is the "fat" CMI that I'm thinking of.
> 
> please don't use perjorative terms like 'fat' and 'thin'.

Sorry, I was internally analogizing to "thinLTO".

--Ben
  
Nathan Sidwell July 20, 2023, 9 p.m. UTC | #12
On 7/19/23 20:47, Ben Boeckel wrote:
> On Wed, Jul 19, 2023 at 17:11:08 -0400, Nathan Sidwell wrote:
>> GCC is neither of these descriptions.  a CMI does not contain the transitive
>> closure of its imports.  It contains an import table.  That table lists the
>> transitive closure of its imports (it needs that closure to do remapping), and
>> that table contains the CMI pathnames of the direct imports.  Those pathnames
>> are absolute, if the mapper provded an absolute pathm or relative to the CMI repo.
>>
>> The rationale here is that if you're building a CMI, Foo, which imports a bunch
>> of modules, those imported CMIs will have the same (relative) location in this
>> compilation and in compilations importing Foo (why would you move them?) Note
>> this is NOT inhibiting relocatable builds, because of the CMI repo.
> 
> But it is inhibiting distributed builds because the distributing tool
> would need to know:
> 
> - what CMIs are actually imported (here, "read the module mapper file"
>    (in CMake's case, this is only the modules that are needed; a single
>    massive mapper file for an entire project would have extra entries) or
>    "act as a proxy for the socket/program specified" for other
>    approaches);

This information is in the machine (& human) README section of the CMI.

> - read the CMIs as it sends to the remote side to gather any other CMIs
>    that may be needed (recursively);
> 
> Contrast this with the MSVC and Clang (17+) mechanism where the command
> line contains everything that is needed and a single bolus can be sent.

um, the build system needs to create that command line? Where does the build 
system get that information?  IIUC it'll need to read some file(s) to do that.


> And relocatable is probably fine. How does it interact with reproducible
> builds? Or are GCC CMIs not really something anyone should consider for
> installation (even as a "here, maybe this can help consumers"
> mechanism)?

Module CMIs should be considered a cacheable artifact.  They are neither object 
files nor source files.

> 
>> On 7/18/23 20:01, Ben Boeckel wrote:
>>> Maybe I'm missing how this *actually* works in GCC as I've really only
>>> interacted with it through the command line, but I've not needed to
>>> mention `a.cmi` when compiling `use.cppm`. Is `a.cmi` referenced and
>>> read through some embedded information in `b.cmi` or does `b.cmi`
>>> include enough information to not need to read it at all? If the former,
>>> distributed builds are going to have a problem knowing what files to
>>> send just from the command line (I'll call this "implicit thin"). If the
>>> latter, that is the "fat" CMI that I'm thinking of.
>>
>> please don't use perjorative terms like 'fat' and 'thin'.
> 
> Sorry, I was internally analogizing to "thinLTO".
> 
> --Ben
  
Ben Boeckel July 21, 2023, 2:57 p.m. UTC | #13
On Thu, Jul 20, 2023 at 17:00:32 -0400, Nathan Sidwell wrote:
> On 7/19/23 20:47, Ben Boeckel wrote:
> > But it is inhibiting distributed builds because the distributing tool
> > would need to know:
> > 
> > - what CMIs are actually imported (here, "read the module mapper file"
> >    (in CMake's case, this is only the modules that are needed; a single
> >    massive mapper file for an entire project would have extra entries) or
> >    "act as a proxy for the socket/program specified" for other
> >    approaches);
> 
> This information is in the machine (& human) README section of the CMI.

Ok. That leaves it up to distributing build tools to figure out at
least.

> > - read the CMIs as it sends to the remote side to gather any other CMIs
> >    that may be needed (recursively);
> > 
> > Contrast this with the MSVC and Clang (17+) mechanism where the command
> > line contains everything that is needed and a single bolus can be sent.
> 
> um, the build system needs to create that command line? Where does the build 
> system get that information?  IIUC it'll need to read some file(s) to do that.

It's chained through the P1689 information in the collator as needed. No
extra files need to be read (at least with CMake's approach); certainly
not CMI files.

> > And relocatable is probably fine. How does it interact with reproducible
> > builds? Or are GCC CMIs not really something anyone should consider for
> > installation (even as a "here, maybe this can help consumers"
> > mechanism)?
> 
> Module CMIs should be considered a cacheable artifact.  They are neither object 
> files nor source files.

Sure, cachable sounds fine. What about the installation?

--Ben
  
Nathan Sidwell July 21, 2023, 8:23 p.m. UTC | #14
On 7/21/23 10:57, Ben Boeckel wrote:
> On Thu, Jul 20, 2023 at 17:00:32 -0400, Nathan Sidwell wrote:
>> On 7/19/23 20:47, Ben Boeckel wrote:
>>> But it is inhibiting distributed builds because the distributing tool
>>> would need to know:
>>>
>>> - what CMIs are actually imported (here, "read the module mapper file"
>>>     (in CMake's case, this is only the modules that are needed; a single
>>>     massive mapper file for an entire project would have extra entries) or
>>>     "act as a proxy for the socket/program specified" for other
>>>     approaches);
>>
>> This information is in the machine (& human) README section of the CMI.
> 
> Ok. That leaves it up to distributing build tools to figure out at
> least.
> 
>>> - read the CMIs as it sends to the remote side to gather any other CMIs
>>>     that may be needed (recursively);
>>>
>>> Contrast this with the MSVC and Clang (17+) mechanism where the command
>>> line contains everything that is needed and a single bolus can be sent.
>>
>> um, the build system needs to create that command line? Where does the build
>> system get that information?  IIUC it'll need to read some file(s) to do that.
> 
> It's chained through the P1689 information in the collator as needed. No
> extra files need to be read (at least with CMake's approach); certainly
> not CMI files.

It occurs to me that the model I am envisioning is similar to CMake's object 
libraries.  Object libraries are a convenient name for a bunch of object files. 
IIUC they're linked by naming the individual object files (or I think the could 
be implemented as a static lib linked with --whole-archive path/to/libfoo.a 
-no-whole-archive.  But for this conversation consider them a bunch of separate 
object files with a convenient group name.

Consider also that object libraries could themselves contain object libraries (I 
don't know of they can, but it seems like a useful concept).  Then one could 
create an object library from a collection of object files and object libraries 
(recursively).  CMake would handle the transitive gtaph.

Now, allow an object library to itself have some kind of tangible, on-disk 
representation.  *BUT* not like a static library -- it doesn't include the 
object files.


Now that immediately maps onto modules.

CMI: Object library
Direct imports: Direct object libraries of an object library

This is why I don't understand the need explicitly indicate the indirect imports 
of a CMI.  CMake knows them, because it knows the graph.

> 
>>> And relocatable is probably fine. How does it interact with reproducible
>>> builds? Or are GCC CMIs not really something anyone should consider for
>>> installation (even as a "here, maybe this can help consumers"
>>> mechanism)?
>>
>> Module CMIs should be considered a cacheable artifact.  They are neither object
>> files nor source files.
> 
> Sure, cachable sounds fine. What about the installation?
> 
> --Ben
  
Ben Boeckel July 24, 2023, 12:26 a.m. UTC | #15
On Fri, Jul 21, 2023 at 16:23:07 -0400, Nathan Sidwell wrote:
> It occurs to me that the model I am envisioning is similar to CMake's object 
> libraries.  Object libraries are a convenient name for a bunch of object files. 
> IIUC they're linked by naming the individual object files (or I think the could 
> be implemented as a static lib linked with --whole-archive path/to/libfoo.a 
> -no-whole-archive.  But for this conversation consider them a bunch of separate 
> object files with a convenient group name.

Yes, `--whole-archive` would work great if it had any kind of
portability across CMake's platform set.

> Consider also that object libraries could themselves contain object libraries (I 
> don't know of they can, but it seems like a useful concept).  Then one could 
> create an object library from a collection of object files and object libraries 
> (recursively).  CMake would handle the transitive gtaph.

I think this detail is relevant, but you can use
`$<TARGET_OBJECT:subobjlib>` as an `INTERFACE` sources and it would act
like that, but it is an explicit thing. Instead, `OBJECT` libraries
*only* provide their objects to targets that *directly* link them. If
not, given this:

    A (OBJECT library)
    B (library of some kind; links PUBLIC to A)
    C (links to B)

If `A` has things like linker flags (or, more likely, libraries) as part
of its usage requirements, C will get them on is link line. However, if
OBJECT files are transitive in the same way, the linker (on most
platforms at least) chokes because it now has duplicates of all of A's
symbols: those from the B library and those from A's objects on the link
line.

> Now, allow an object library to itself have some kind of tangible, on-disk 
> representation.  *BUT* not like a static library -- it doesn't include the 
> object files.
> 
> 
> Now that immediately maps onto modules.
> 
> CMI: Object library
> Direct imports: Direct object libraries of an object library
> 
> This is why I don't understand the need explicitly indicate the indirect imports 
> of a CMI.  CMake knows them, because it knows the graph.

Sure, *CMake* knows them, but the *build tool* needs to be told
(typically `make` or `ninja`) because it is what is actually executing
the build graph. The way this is communicated is via `-MF` files and
that's what I'm providing in this patch. Note that `ninja` does not
allow rules to specify such dependencies for other rules than the one it
is reading the file for.

--Ben
  
Jason Merrill July 28, 2023, 1:13 a.m. UTC | #16
On 7/23/23 20:26, Ben Boeckel wrote:
> On Fri, Jul 21, 2023 at 16:23:07 -0400, Nathan Sidwell wrote:
>> It occurs to me that the model I am envisioning is similar to CMake's object
>> libraries.  Object libraries are a convenient name for a bunch of object files.
>> IIUC they're linked by naming the individual object files (or I think the could
>> be implemented as a static lib linked with --whole-archive path/to/libfoo.a
>> -no-whole-archive.  But for this conversation consider them a bunch of separate
>> object files with a convenient group name.
> 
> Yes, `--whole-archive` would work great if it had any kind of
> portability across CMake's platform set.
> 
>> Consider also that object libraries could themselves contain object libraries (I
>> don't know of they can, but it seems like a useful concept).  Then one could
>> create an object library from a collection of object files and object libraries
>> (recursively).  CMake would handle the transitive gtaph.
> 
> I think this detail is relevant, but you can use
> `$<TARGET_OBJECT:subobjlib>` as an `INTERFACE` sources and it would act
> like that, but it is an explicit thing. Instead, `OBJECT` libraries
> *only* provide their objects to targets that *directly* link them. If
> not, given this:
> 
>      A (OBJECT library)
>      B (library of some kind; links PUBLIC to A)
>      C (links to B)
> 
> If `A` has things like linker flags (or, more likely, libraries) as part
> of its usage requirements, C will get them on is link line. However, if
> OBJECT files are transitive in the same way, the linker (on most
> platforms at least) chokes because it now has duplicates of all of A's
> symbols: those from the B library and those from A's objects on the link
> line.
> 
>> Now, allow an object library to itself have some kind of tangible, on-disk
>> representation.  *BUT* not like a static library -- it doesn't include the
>> object files.
>>
>>
>> Now that immediately maps onto modules.
>>
>> CMI: Object library
>> Direct imports: Direct object libraries of an object library
>>
>> This is why I don't understand the need explicitly indicate the indirect imports
>> of a CMI.  CMake knows them, because it knows the graph.
> 
> Sure, *CMake* knows them, but the *build tool* needs to be told
> (typically `make` or `ninja`) because it is what is actually executing
> the build graph. The way this is communicated is via `-MF` files and
> that's what I'm providing in this patch. Note that `ninja` does not
> allow rules to specify such dependencies for other rules than the one it
> is reading the file for.

But since the direct imports need to be rebuilt themselves if the 
transitive imports change, the build graph should be the same whether or 
not the transitive imports are repeated?  Either way, if a transitive 
import changes you need to rebuild the direct import and then the importer.

I guess it shouldn't hurt to have the transitive imports in the -MF 
file, as long as they aren't also in the p1689 file, so I'm not 
particularly opposed to this change, but I don't see how it makes a 
practical difference.

Jason
  
Ben Boeckel July 29, 2023, 2:25 p.m. UTC | #17
On Thu, Jul 27, 2023 at 18:13:48 -0700, Jason Merrill wrote:
> On 7/23/23 20:26, Ben Boeckel wrote:
> > Sure, *CMake* knows them, but the *build tool* needs to be told
> > (typically `make` or `ninja`) because it is what is actually executing
> > the build graph. The way this is communicated is via `-MF` files and
> > that's what I'm providing in this patch. Note that `ninja` does not
> > allow rules to specify such dependencies for other rules than the one it
> > is reading the file for.
> 
> But since the direct imports need to be rebuilt themselves if the 
> transitive imports change, the build graph should be the same whether or 
> not the transitive imports are repeated?  Either way, if a transitive 
> import changes you need to rebuild the direct import and then the importer.

I suppose I have seen enough bad build systems that don't do everything
correctly that I'm interested in creating "pits of success" rather than
"well, you didn't get thing X 100% correct, so you're screwed here too".

The case that I think is most likely here is that someone has a
"superbuild" with 3 projects A, B, and C where C uses B and B uses A. At
the top-level the superbuild exposes just "make projectA
projectB projectC"-granularity (rather than a combined build graph; they
may use different build systems) and then users go into some projectC
directly and forget to update projectB after updating projectA (known to
all use the same compiler/flags so that CMI sharing is possible). The
build it still broken, but ideally they get notified in some useful way
when rebuilding the TU rather than…whatever ends up catching the problem
incidentally.

> I guess it shouldn't hurt to have the transitive imports in the -MF 
> file, as long as they aren't also in the p1689 file, so I'm not 
> particularly opposed to this change, but I don't see how it makes a 
> practical difference.

Correct. The P1689 shouldn't even know about transitive imports (well,
maybe from header units?) as it just records "I saw an `import`
statement" and should never look up CMI files (indeed, we would need
another scanning step to know what CMI files to create for the P1689
scan if they were necessary…).

--Ben
  

Patch

diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc
index ebd30f63d81..dbd1b721616 100644
--- a/gcc/cp/module.cc
+++ b/gcc/cp/module.cc
@@ -18966,6 +18966,8 @@  module_state::do_import (cpp_reader *reader, bool outermost)
       dump () && dump ("CMI is %s", file);
       if (note_module_cmi_yes || inform_cmi_p)
 	inform (loc, "reading CMI %qs", file);
+      /* Add the CMI file to the dependency tracking. */
+      deps_add_dep (cpp_get_deps (reader), file);
       fd = open (file, O_RDONLY | O_CLOEXEC | O_BINARY);
       e = errno;
     }