gdb/testsuite/rocm: Dynamically determine if precise-memory is supported
Checks
| Context |
Check |
Description |
| linaro-tcwg-bot/tcwg_gdb_build--master-aarch64 |
success
|
Build passed
|
| linaro-tcwg-bot/tcwg_gdb_build--master-arm |
success
|
Build passed
|
| linaro-tcwg-bot/tcwg_gdb_check--master-aarch64 |
success
|
Test passed
|
| linaro-tcwg-bot/tcwg_gdb_check--master-arm |
success
|
Test passed
|
Commit Message
Currently, in order to know if precise-memory is supported on the
system, we compare the available GPUs against the list of devices known
to not support this feature. This patch changes it and instead probes
for the device capabilities as advertised by the driver, and checks if
all devices support precise memory.
This is to accommodate for pending change in the driver
https://lists.freedesktop.org/archives/amd-gfx/2025-March/121355.html
Tested on gfx1030 and gfx1100.
Change-Id: I9290c677c01015dfb509ca45895d8a99165b4b27
---
gdb/testsuite/lib/rocm.exp | 42 +++++++++++++++++++++++++++++---------
1 file changed, 32 insertions(+), 10 deletions(-)
base-commit: a7e5d97c123b5164460a604a154a239fbcfadd86
Comments
Hi,
On 2025-03-21 14:46, Lancelot SIX wrote:
> Hi,
>
> This is a V2 replacing
> https://sourceware.org/pipermail/gdb-patches/2025-March/216420.html.
>
> This series aims at changing how the testsuite looks for
> "precise-memory" support on AMDGPU devices. Before this series, GDB's
> testsuite had a list of architectures known to support or not this
> functionality. However, a recent driver[1] changed the configuration
> for some devices, making that static assumption invalid.
>
> This series aims at removing this static information, and instead adapt
> the test based on what GDB reports.
>
> Changes Since V1:
> - Do not read the capability from sysfs.
>
Thanks for doing this. This version is OK, with the missing escape in the other patch fixed.
Approved-by: Pedro Alves <pedro@palves.net>
> Thanks for doing this. This version is OK, with the missing escape in the other patch fixed.
>
> Approved-by: Pedro Alves <pedro@palves.net>
>
Thanks,
I fixed the missing escape and pushed the series as
- 64f6e72d4eb
- da72ce7ff1b
- efcfb26ae61
Best,
Lancelot.
@@ -177,21 +177,43 @@ proc hip_devices_support_debug_multi_process {} {
return 1
}
-# Return true if all the devices on the host support precise memory.
+# Helper function which returns if all ROCm devices in the system have
+# the CAP capabilities. Capabilities are defined in
+# /usr/include/linux/kfd_sysfs.h.
-proc hip_devices_support_precise_memory {} {
- set unsupported_targets \
- {gfx900 gfx906 gfx908 gfx1010 gfx1011 gfx1012 gfx1030 gfx1031 gfx1032}
+proc hip_devices_have_cap {cap} {
+ set prop_files \
+ [glob /sys/devices/virtual/kfd/kfd/topology/nodes/*/properties]
- set targets [find_amdgpu_devices]
- if { [llength $targets] == 0 } {
+ if {[llength $prop_files] == 0} {
return 0
}
- foreach target $targets {
- if { [lsearch -exact $unsupported_targets $target] != -1 } {
- return 0
+ set gpu_with_support 0
+ foreach f $prop_files {
+ set fd [open $f]
+ while { [gets $fd line] >= 0 } {
+ set x [split $line " "]
+ set prop_name [lindex $x 0]
+ set val [lindex $x 1]
+ if {$prop_name eq "capability"} {
+ if {[expr $val & $cap]} {
+ incr gpu_with_support
+ }
+ }
}
+ close $fd
}
- return 1
+
+ return [expr [llength prop_files] == $gpu_with_support]
+}
+
+# Return true if all the devices on the host support precise memory.
+
+proc hip_devices_support_precise_memory {} {
+ # From /usr/include/linux/kfd_sysfs.h
+ set HSA_CAP_TRAP_DEBUG_PRECISE_MEMORY_OPERATIONS_SUPPORTED 0x00040000
+
+ return [hip_devices_have_cap \
+ $HSA_CAP_TRAP_DEBUG_PRECISE_MEMORY_OPERATIONS_SUPPORTED]
}