patch for hp-timing.h

Message ID PAWPR08MB89825143A8C77510EB33C9D783D79@PAWPR08MB8982.eurprd08.prod.outlook.com
State Superseded
Headers
Series patch for hp-timing.h |

Checks

Context Check Description
dj/TryBot-apply_patch fail Patch failed to apply to master at the time it was sent
dj/TryBot-32bit fail Patch series failed to apply

Commit Message

Wilco Dijkstra Feb. 3, 2023, 1:29 p.m. UTC
  Hi Jun,

> hp-timing is only used on benchtest today, but it can be used on other
> measurements in the future.

It can't as proposed below since this would overflow in ~16 seconds.
We could improve this by removing the zero bits from the multiply
and doing some shifts, maybe that will make the interval long enough
for integer multiply. Otherwise I'd suggest the floating point version since
it's not like the benchtests don't already use floating point.

Cheers,
Wilco
  

Comments

Adhemerval Zanella Feb. 3, 2023, 1:38 p.m. UTC | #1
On 03/02/23 10:29, Wilco Dijkstra via Libc-alpha wrote:
> Hi Jun,
> 
>> hp-timing is only used on benchtest today, but it can be used on other
>> measurements in the future.
> 
> It can't as proposed below since this would overflow in ~16 seconds.
> We could improve this by removing the zero bits from the multiply
> and doing some shifts, maybe that will make the interval long enough
> for integer multiply. Otherwise I'd suggest the floating point version since
> it's not like the benchtests don't already use floating point.
> 
> Cheers,
> Wilco

Do we really need to handle more than ~16 seconds for the hp-timing? It
is usually used along with a accumulator and I would expect that once you
need to measure more than 16s using clock_gettime works as expected.

> 
> diff --git a/benchtests/Makefile b/benchtests/Makefile
> index 292976b26b..a624614207 100644
> --- a/benchtests/Makefile
> +++ b/benchtests/Makefile
> @@ -499,4 +499,5 @@ $(objpfx)bench-%.c: %-inputs $(bench-deps)
>  	  cat $($*-INCLUDE); \
>  	fi; \
>  	$(PYTHON) scripts/bench.py $(patsubst %-inputs,%,$<); } > $@-tmp
> +	cp -f $@-tmp $@-bak
> 
> Unintended change I guess?
> 
>  	mv -f $@-tmp $@
> diff --git a/sysdeps/aarch64/hp-timing.h b/sysdeps/aarch64/hp-timing.h
> index f7f7ac7cae..c699effe6a 100644
> --- a/sysdeps/aarch64/hp-timing.h
> +++ b/sysdeps/aarch64/hp-timing.h
> @@ -41,7 +41,7 @@ typedef uint64_t hp_timing_t;
>  #define HP_TIMING_DIFF(Diff, Start, End)			\
>  ({  hp_timing_t freq;						\
>      __asm__ __volatile__ ("mrs %0, cntfrq_el0" : "=r" (freq));	\
> -   (Diff) = ((End) - (Start)) * (UINT64_C(1000000000) / freq);	\
> +   (Diff) = (((End) - (Start)) * UINT64_C(1000000000)) / freq; \
>  })
>  
>  #endif	/* hp-timing.h */
  
develop--- via Libc-alpha Feb. 3, 2023, 2:02 p.m. UTC | #2
Wilco,

Removing zero bits from multiply and making shifts afterwards can extend the time before overflow.  Does it also reduce the resolution of the timer?

Regards,

Jun

-----Original Message-----
From: Wilco Dijkstra <Wilco.Dijkstra@arm.com> 
Sent: Friday, February 3, 2023 7:30 AM
To: Tang, Jun <juntangc@amazon.com>
Cc: 'GNU C Library' <libc-alpha@sourceware.org>
Subject: [EXTERNAL] patch for hp-timing.h

CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.



Hi Jun,

> hp-timing is only used on benchtest today, but it can be used on other 
> measurements in the future.

It can't as proposed below since this would overflow in ~16 seconds.
We could improve this by removing the zero bits from the multiply and doing some shifts, maybe that will make the interval long enough for integer multiply. Otherwise I'd suggest the floating point version since it's not like the benchtests don't already use floating point.

Cheers,
Wilco

diff --git a/benchtests/Makefile b/benchtests/Makefile index 292976b26b..a624614207 100644
--- a/benchtests/Makefile
+++ b/benchtests/Makefile
@@ -499,4 +499,5 @@ $(objpfx)bench-%.c: %-inputs $(bench-deps)
          cat $($*-INCLUDE); \
        fi; \
        $(PYTHON) scripts/bench.py $(patsubst %-inputs,%,$<); } > $@-tmp
+       cp -f $@-tmp $@-bak

Unintended change I guess?

        mv -f $@-tmp $@
diff --git a/sysdeps/aarch64/hp-timing.h b/sysdeps/aarch64/hp-timing.h index f7f7ac7cae..c699effe6a 100644
--- a/sysdeps/aarch64/hp-timing.h
+++ b/sysdeps/aarch64/hp-timing.h
@@ -41,7 +41,7 @@ typedef uint64_t hp_timing_t;
 #define HP_TIMING_DIFF(Diff, Start, End)                       \
 ({  hp_timing_t freq;                                          \
     __asm__ __volatile__ ("mrs %0, cntfrq_el0" : "=r" (freq)); \
-   (Diff) = ((End) - (Start)) * (UINT64_C(1000000000) / freq); \
+   (Diff) = (((End) - (Start)) * UINT64_C(1000000000)) / freq; \
 })

 #endif /* hp-timing.h */
  
Wilco Dijkstra Feb. 6, 2023, 4:53 p.m. UTC | #3
Hi Jun,

> Removing zero bits from multiply and making shifts afterwards can extend the time before
> overflow.  Does it also reduce the resolution of the timer?

It wouldn't if we shift out zero bits. For example 1000000 has 6 zero bits
at the bottom, so assuming every implementation uses a multiple of 1MHz
then there is no loss of accuracy. And 6 extra bits would mean 1024 seconds
before overflow at 1GHz which seems long enough.

Cheers,
Wilco
  
develop--- via Libc-alpha Feb. 6, 2023, 7:34 p.m. UTC | #4
Wilco,

Thanks for the suggestion, 

There are 7 zero bits at the end of freq (1050000000).  So the time can be extended to 2048 second before losing resolution.  Below is the updated patch for 1024 sec -

-   (Diff) = ((End) - (Start)) * (UINT64_C(1000000000) / freq); \
+   (Diff) = (((End) - (Start)) * UINT64_C(1000000000>>6)) / (freq>>6); \
 
Regards,

Jun
 
-----Original Message-----
From: Wilco Dijkstra <Wilco.Dijkstra@arm.com> 
Sent: Monday, February 6, 2023 10:54 AM
To: Tang, Jun <juntangc@amazon.com>
Cc: 'GNU C Library' <libc-alpha@sourceware.org>
Subject: RE: [EXTERNAL]patch for hp-timing.h

CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.



Hi Jun,

> Removing zero bits from multiply and making shifts afterwards can 
> extend the time before overflow.  Does it also reduce the resolution of the timer?

It wouldn't if we shift out zero bits. For example 1000000 has 6 zero bits at the bottom, so assuming every implementation uses a multiple of 1MHz then there is no loss of accuracy. And 6 extra bits would mean 1024 seconds before overflow at 1GHz which seems long enough.

Cheers,
Wilco
  
Wilco Dijkstra Feb. 8, 2023, 2:58 p.m. UTC | #5
Hi Jun,

> There are 7 zero bits at the end of freq (1050000000).  So the time can be extended to
> 2048 second before losing resolution.  Below is the updated patch for 1024 sec -
>
> -   (Diff) = ((End) - (Start)) * (UINT64_C(1000000000) / freq); \
> +   (Diff) = (((End) - (Start)) * UINT64_C(1000000000>>6)) / (freq>>6); \
 
Minor codestyle: there should be a space before/after '>>'.

Apart from that it looks good and works fine.

Can you resend an updated patch with comment header etc in a new email with
subject like "[PATCH v2] AArch64: Fix HP_TIMING_DIFF computation" or similar?

I guess this is your first patch and you don't have commit rights?

Cheers,
Wilco
  

Patch

diff --git a/benchtests/Makefile b/benchtests/Makefile
index 292976b26b..a624614207 100644
--- a/benchtests/Makefile
+++ b/benchtests/Makefile
@@ -499,4 +499,5 @@  $(objpfx)bench-%.c: %-inputs $(bench-deps)
 	  cat $($*-INCLUDE); \
 	fi; \
 	$(PYTHON) scripts/bench.py $(patsubst %-inputs,%,$<); } > $@-tmp
+	cp -f $@-tmp $@-bak

Unintended change I guess?

 	mv -f $@-tmp $@
diff --git a/sysdeps/aarch64/hp-timing.h b/sysdeps/aarch64/hp-timing.h
index f7f7ac7cae..c699effe6a 100644
--- a/sysdeps/aarch64/hp-timing.h
+++ b/sysdeps/aarch64/hp-timing.h
@@ -41,7 +41,7 @@  typedef uint64_t hp_timing_t;
 #define HP_TIMING_DIFF(Diff, Start, End)			\
 ({  hp_timing_t freq;						\
     __asm__ __volatile__ ("mrs %0, cntfrq_el0" : "=r" (freq));	\
-   (Diff) = ((End) - (Start)) * (UINT64_C(1000000000) / freq);	\
+   (Diff) = (((End) - (Start)) * UINT64_C(1000000000)) / freq; \
 })
 
 #endif	/* hp-timing.h */