[v2,1/3,sim/riscv] Add basic semi-hosting support

Message ID 20231030130042.1472535-2-jaydeep.patil@imgtec.com
State New
Headers
Series sim: riscv: Compressed instruction simulation and semi-hosting support |

Checks

Context Check Description
linaro-tcwg-bot/tcwg_gdb_build--master-arm success Testing passed
linaro-tcwg-bot/tcwg_gdb_build--master-aarch64 success Testing passed
linaro-tcwg-bot/tcwg_gdb_check--master-arm success Testing passed
linaro-tcwg-bot/tcwg_gdb_check--master-aarch64 success Testing passed

Commit Message

Jaydeep Patil Oct. 30, 2023, 1 p.m. UTC
  From: Jaydeep Patil <jaydeep.patil@imgtec.com>

Added support for basic semi-hosting calls OPEN, EXIT and GET_CMDLINE.
Added gdb.arch/riscv-exit-getcmd.c to test it.
---
 gdb/testsuite/gdb.arch/riscv-exit-getcmd.c   |  26 +++
 gdb/testsuite/gdb.arch/riscv-exit-getcmd.exp |  27 +++
 sim/riscv/riscv-sim.h                        |  32 +++
 sim/riscv/sim-main.c                         | 230 ++++++++++++++++++-
 4 files changed, 310 insertions(+), 5 deletions(-)
 create mode 100644 gdb/testsuite/gdb.arch/riscv-exit-getcmd.c
 create mode 100644 gdb/testsuite/gdb.arch/riscv-exit-getcmd.exp
  

Comments

Mike Frysinger Nov. 29, 2023, 7:57 a.m. UTC | #1
On 30 Oct 2023 13:00, jaydeep.patil@imgtec.com wrote:
> Added support for basic semi-hosting calls OPEN, EXIT and GET_CMDLINE.

what host environment are you implementing ?  none of the syscalls you've
defined match what have long been in the riscv libgloss port.  those are
the only ones i'd really expect at this point to be wired up.

> Added gdb.arch/riscv-exit-getcmd.c to test it.
> ---
>  gdb/testsuite/gdb.arch/riscv-exit-getcmd.c   |  26 +++
>  gdb/testsuite/gdb.arch/riscv-exit-getcmd.exp |  27 +++
>  sim/riscv/riscv-sim.h                        |  32 +++
>  sim/riscv/sim-main.c                         | 230 ++++++++++++++++++-

you're missing sim/ tests.  gdb tests are great, but not sufficient.

> @@ -990,6 +1209,7 @@ execute_one (SIM_CPU *cpu, unsigned_word iw, const struct riscv_opcode *op)
>      case INSN_CLASS_A:
>        return execute_a (cpu, iw, op);
>      case INSN_CLASS_I:
> +    case INSN_CLASS_ZICSR:
>        return execute_i (cpu, iw, op);

seems unrelated to this patch
-mike
  
Andrew Burgess Dec. 12, 2023, 5:24 p.m. UTC | #2
Mike Frysinger <vapier@gentoo.org> writes:

> On 30 Oct 2023 13:00, jaydeep.patil@imgtec.com wrote:
>> Added support for basic semi-hosting calls OPEN, EXIT and GET_CMDLINE.
>
> what host environment are you implementing ?  none of the syscalls you've
> defined match what have long been in the riscv libgloss port.  those are
> the only ones i'd really expect at this point to be wired up.

This would be the RISC-V semihosting target, which is included in
newlib (since 2020), but is separate to libgloss.

Where libgloss syscalls use ecall, the semihosting uses ebreak with two
specific nop instructions (one before, one after the ebreak).

Do you see any reason why we can't support both of these syscall
libraries?  Potentially we could have a switch to select between them,
but I'm inclined to say we should just support both until someone comes
with a use-case where supporting both is a bad idea...  But maybe you
have some deeper insights here.

>
>> Added gdb.arch/riscv-exit-getcmd.c to test it.
>> ---
>>  gdb/testsuite/gdb.arch/riscv-exit-getcmd.c   |  26 +++
>>  gdb/testsuite/gdb.arch/riscv-exit-getcmd.exp |  27 +++
>>  sim/riscv/riscv-sim.h                        |  32 +++
>>  sim/riscv/sim-main.c                         | 230 ++++++++++++++++++-
>
> you're missing sim/ tests.  gdb tests are great, but not sufficient.

100% agree with this.

Thanks,
Andrew

>
>> @@ -990,6 +1209,7 @@ execute_one (SIM_CPU *cpu, unsigned_word iw, const struct riscv_opcode *op)
>>      case INSN_CLASS_A:
>>        return execute_a (cpu, iw, op);
>>      case INSN_CLASS_I:
>> +    case INSN_CLASS_ZICSR:
>>        return execute_i (cpu, iw, op);
>
> seems unrelated to this patch
> -mike
  
Andrew Burgess Dec. 12, 2023, 5:57 p.m. UTC | #3
<jaydeep.patil@imgtec.com> writes:

> From: Jaydeep Patil <jaydeep.patil@imgtec.com>
>
> Added support for basic semi-hosting calls OPEN, EXIT and GET_CMDLINE.
> Added gdb.arch/riscv-exit-getcmd.c to test it.
> ---
>  gdb/testsuite/gdb.arch/riscv-exit-getcmd.c   |  26 +++
>  gdb/testsuite/gdb.arch/riscv-exit-getcmd.exp |  27 +++
>  sim/riscv/riscv-sim.h                        |  32 +++
>  sim/riscv/sim-main.c                         | 230 ++++++++++++++++++-
>  4 files changed, 310 insertions(+), 5 deletions(-)
>  create mode 100644 gdb/testsuite/gdb.arch/riscv-exit-getcmd.c
>  create mode 100644 gdb/testsuite/gdb.arch/riscv-exit-getcmd.exp
>
> diff --git a/gdb/testsuite/gdb.arch/riscv-exit-getcmd.c b/gdb/testsuite/gdb.arch/riscv-exit-getcmd.c
> new file mode 100644
> index 00000000000..02a97da8643
> --- /dev/null
> +++ b/gdb/testsuite/gdb.arch/riscv-exit-getcmd.c
> @@ -0,0 +1,26 @@
> +/* This file is part of GDB, the GNU debugger.
> +
> +   Copyright 2023 Free Software Foundation, Inc.
> +
> +   This program is free software; you can redistribute it and/or modify
> +   it under the terms of the GNU General Public License as published by
> +   the Free Software Foundation; either version 3 of the License, or
> +   (at your option) any later version.
> +
> +   This program is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +   GNU General Public License for more details.
> +
> +   You should have received a copy of the GNU General Public License
> +   along with this program.  If not, see <http://www.gnu.org/licenses/>.  */
> +
> +/* Test basic semi-hosting calls SYS_GET_CMDLINE and SYS_EXIT.  */
> +
> +int
> +main (int argc, char **argv)
> +{
> +  if (argc != 4)
> +    return 1;
> +  return 0;
> +}
> diff --git a/gdb/testsuite/gdb.arch/riscv-exit-getcmd.exp b/gdb/testsuite/gdb.arch/riscv-exit-getcmd.exp
> new file mode 100644
> index 00000000000..7f5670200c2
> --- /dev/null
> +++ b/gdb/testsuite/gdb.arch/riscv-exit-getcmd.exp
> @@ -0,0 +1,27 @@
> +# Copyright 2023 Free Software Foundation, Inc.
> +#
> +# This program is free software; you can redistribute it and/or modify
> +# it under the terms of the GNU General Public License as published by
> +# the Free Software Foundation; either version 3 of the License, or
> +# (at your option) any later version.
> +#
> +# This program is distributed in the hope that it will be useful,
> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +# GNU General Public License for more details.
> +#
> +# You should have received a copy of the GNU General Public License
> +# along with this program.  If not, see <http://www.gnu.org/licenses/>.
> +
> +# Test basic semi-hosting calls SYS_GET_CMDLINE and SYS_EXIT.
> +
> +require {istarget "riscv*-*-*"}
> +
> +standard_testfile
> +
> +if { [prepare_for_testing "failed to prepare" ${testfile} ${srcfile} \
> +	  {debug quiet}] } {
> +    return -1
> +}
> +
> +gdb_test "run 1 2 3" ".*Inferior.*process.*exited normally.*"
> diff --git a/sim/riscv/riscv-sim.h b/sim/riscv/riscv-sim.h
> index 1bc9aa12156..bb296fa6e62 100644
> --- a/sim/riscv/riscv-sim.h
> +++ b/sim/riscv/riscv-sim.h
> @@ -75,4 +75,36 @@ extern void initialize_env (SIM_DESC, const char * const *argv,
>  
>  #define RISCV_XLEN(cpu) MACH_WORD_BITSIZE (CPU_MACH (cpu))
>  
> +#define APPLICATION_EXIT 0x20026
> +
> +#define SYS_OPEN 0x01
> +#define SYS_GET_CMDLINE 0x15
> +#define SYS_EXIT 0x18

I wonder if we should avoid the SYS_ prefix here?  And instead go with
something that binds these more tightly to semi-hosting?  SEMIHOST_
maybe? e.g. SEMIHOST_OPEN

> +
> +#define GDB_O_RDONLY 0x000
> +#define GDB_O_WRONLY 0x001
> +#define GDB_O_RDWR 0x002
> +#define GDB_O_APPEND 0x008
> +#define GDB_O_CREAT 0x200
> +#define GDB_O_TRUNC 0x400
> +#define GDB_O_BINARY 0

The GDB_ prefix seems weird.  I haven't tried to figure this out.  Is
this in some way GDB specific?

> +
> +#define SEMI_HOSTING_SLLI 0x01f01013
> +#define SEMI_HOSTING_SRAI 0x40705013
> +
> +static int gdb_open_modeflags[12] = {
> +  GDB_O_RDONLY,
> +  GDB_O_RDONLY | GDB_O_BINARY,
> +  GDB_O_RDWR,
> +  GDB_O_RDWR | GDB_O_BINARY,
> +  GDB_O_WRONLY | GDB_O_CREAT | GDB_O_TRUNC,
> +  GDB_O_WRONLY | GDB_O_CREAT | GDB_O_TRUNC | GDB_O_BINARY,
> +  GDB_O_RDWR | GDB_O_CREAT | GDB_O_TRUNC,
> +  GDB_O_RDWR | GDB_O_CREAT | GDB_O_TRUNC | GDB_O_BINARY,
> +  GDB_O_WRONLY | GDB_O_CREAT | GDB_O_APPEND,
> +  GDB_O_WRONLY | GDB_O_CREAT | GDB_O_APPEND | GDB_O_BINARY,
> +  GDB_O_RDWR | GDB_O_CREAT | GDB_O_APPEND,
> +  GDB_O_RDWR | GDB_O_CREAT | GDB_O_APPEND | GDB_O_BINARY
> +};

So, my instinct would be that all of these should be moved into
sim-main.c.  That's the only place they are used, and is likely the only
place they are going to be used ... so I see no reason why they need to
live in the header file.

Additionally, I think we need to comment some/all of these defines; e.g.
APPLICATION_EXIT needs a comment, SEMI_HOSTING_{SLLI,SRAI} need
comments, and gdb_open_modeflags would need comments.  You should
describe where these values come from, what they mean, where should I go
to read/understand more (e.g. name the spec or reference implementation).

> +
>  #endif
> diff --git a/sim/riscv/sim-main.c b/sim/riscv/sim-main.c
> index afdfcf50656..8e08eb9fc4e 100644
> --- a/sim/riscv/sim-main.c
> +++ b/sim/riscv/sim-main.c
> @@ -136,6 +136,176 @@ store_csr (SIM_CPU *cpu, const char *name, int csr, unsigned_word *reg,
>    TRACE_REGISTER (cpu, "wrote CSR %s = %#" PRIxTW, name, val);
>  }
>  
> +/* Read 4 or 8 bytes of data from the core memory.  ADDR and (INDEX * XLEN)
> +   form the base address.  */
> +static uintptr_t

I would have thought unsigned_8 would be a better return type.  The
requirement for this function is that it return something unsigned, and
at least 8-bytes in length.

Also the comment should say that 4-byte values are zero extended.

> +get_core_data (SIM_CPU *cpu, unsigned_word addr, unsigned_word index)
> +{
> +  uintptr_t param;
> +  int xlen = RISCV_XLEN (cpu);
> +  struct riscv_sim_cpu *riscv_cpu = RISCV_SIM_CPU (cpu);
> +
> +  if (xlen == 64)
> +    param = sim_core_read_unaligned_8 (cpu, riscv_cpu->pc, read_map,
> +		addr + (index * 8));
> +  else
> +    param = sim_core_read_unaligned_4 (cpu, riscv_cpu->pc, read_map,
> +		addr + (index * 4));
> +
> +  return param;
> +}
> +
> +/* Write string in HOST_BUF at CORE_ADDR.  The length of string is LEN.  */
> +static void
> +set_core_string (SIM_CPU *cpu, unsigned_word core_addr, char *host_buf,
> +		 int len)
> +{
> +  struct riscv_sim_cpu *riscv_cpu = RISCV_SIM_CPU (cpu);
> +  for (int i = 0; i < len; i++)
> +    {
> +      sim_core_write_unaligned_1 (cpu, riscv_cpu->pc, write_map,
> +				  core_addr + i, host_buf[i]);
> +    }

GNU style it to drop the { ... } around single statement blocks.

> +}
> +
> +/* Read string of length LEN at address ADDR.  */
> +static char *
> +get_core_string_with_len (SIM_CPU *cpu, unsigned_word addr,
> +			  unsigned_word len)
> +{
> +  struct riscv_sim_cpu *riscv_cpu = RISCV_SIM_CPU (cpu);
> +  char * str;
> +  str = (char *) malloc (len + 1);

Use xmalloc.  It will handle (abort) for failure to allocate memory.

> +
> +  for (int i = 0; i < len; i++)
> +    {
> +      str[i] = sim_core_read_unaligned_1 (cpu, riscv_cpu->pc, read_map,
> +					  addr + i);
> +    }

Same with the single statement block.

> +  str[len] = 0;

Better I think to write:

  str[len] = '\0';

> +
> +  return str;
> +}
> +
> +/* SYS_OPEN
> +   Register a1 points to a buffer containing:
> +   Index 0: Pointer: Address of the file name string.
> +   Index 1: Integer: File open flags.
> +   Index 2: Integer: Length of the file name string.
> +   File handle is returned through register a0.  */
> +static void
> +semihosting_open (SIM_CPU *cpu)
> +{
> +  uintptr_t fname_addr;
> +  uintptr_t flags;
> +  uintptr_t fname_len;
> +  char *name;
> +  struct riscv_sim_cpu *riscv_cpu = RISCV_SIM_CPU (cpu);
> +
> +  fname_addr = (uintptr_t) get_core_data (cpu, riscv_cpu->a1, 0);
> +  flags = (uintptr_t) get_core_data (cpu, riscv_cpu->a1, 1);
> +  fname_len = (uintptr_t) get_core_data (cpu, riscv_cpu->a1, 2);

These casts all seem redundant given the return type of get_core_data.

> +
> +  if (fname_len <= 0)
> +    {
> +      riscv_cpu->a0 = -1;
> +      return;
> +    }

I wonder if we should place some (arbitrary) cap on fname_len.  A badly
behaved application could cause the simulator to (try and) allocate 2^64
bytes of memory.

> +
> +  name = get_core_string_with_len (cpu, fname_addr, fname_len);
> +  riscv_cpu->a0 = sim_io_open (CPU_STATE (cpu), name,
> +			       gdb_open_modeflags[flags]);
> +  free (name);
> +}
> +
> +/* SYS_EXIT.  In case of RV32, register a1 holds the application stop reason
> +   and the exit code is decided based on it.  Otherwise, register a1 points to
> +   a buffer containing:
> +   Index 0: Integer: Application stop reason (ignored)
> +   Index 1: Integer: Exit code.  */
> +static void
> +semihosting_exit (SIM_CPU *cpu)
> +{
> +  uintptr_t app_code, exit_code;
> +  SIM_DESC sd = CPU_STATE (cpu);
> +  struct riscv_sim_cpu *riscv_cpu = RISCV_SIM_CPU (cpu);
> +  if (RISCV_XLEN (cpu) == 32)
> +    {
> +      app_code = riscv_cpu->a1;
> +      if (app_code == APPLICATION_EXIT)
> +	exit_code = 0;
> +      else
> +	exit_code = 1;
> +    }
> +  else
> +    exit_code = (uintptr_t) get_core_data (cpu, riscv_cpu->a1, 1);
> +  riscv_cpu->a0 = exit_code;
> +  sim_engine_halt (sd, cpu, NULL, riscv_cpu->pc, sim_exited, exit_code);

I wonder if we should try to marshal these calls through sim_syscall
where possible?  I know we're going to need some syscall specific stubs
for each syscall as fetching the arguments is not as simple as just
reading the registers as for the ecall syscall case.  But it feels like,
once we've got the arguments we should try to get back onto the "normal"
syscall handling path if possible.  Thoughts?

> +}
> +
> +/* SYS_GET_CMDLINE.  Write command line arguments to a buffer.  Arguments
> +   passed to "run" command are stored in STATE_PROG_ARGV member of SIM_DESC.
> +   Register a1 points to a buffer containing:
> +   Index 0: Pointer: Buffer to store command line arguments
> +   Index 1: Integer: Maximum length of the buffer.  */
> +static void
> +semihosting_get_cmdline (SIM_CPU *cpu)
> +{

So this one, I think, doesn't have a "normal" syscall equivalent (or
maybe I missed it?), so this is going to have to be handled separately,
but that's OK I think.

> +  int i = 0, len = 0, total_len = 0;
> +  uintptr_t buf_addr, max_buf_len;
> +  SIM_DESC sd = CPU_STATE (cpu);
> +  char *space = " ";
> +  struct riscv_sim_cpu *riscv_cpu = RISCV_SIM_CPU (cpu);
> +
> +  char **prog_argv = STATE_PROG_ARGV (sd);
> +  if (prog_argv == NULL)
> +    {
> +      riscv_cpu->a0 = 1; /* Return non-zero to indicate error.  */

Comments before the line of code please, not at the end of the line.

> +      return;
> +    }
> +
> +  buf_addr = (uintptr_t) get_core_data (cpu, riscv_cpu->a1, 0);
> +  max_buf_len = (uintptr_t) get_core_data (cpu, riscv_cpu->a1, 1);

Unnecessary casts again.

> +
> +  while (prog_argv[i])
> +    {
> +      len = strlen (prog_argv[i]);
> +      if ((total_len + len) > max_buf_len)
> +	break;
> +      set_core_string (cpu, buf_addr, prog_argv[i], len);
> +      set_core_string (cpu, buf_addr + len, space, 1);
> +      len++;	/* Terminate it with space.  */

Comment before the line of code.  And I don't see how that line is
adding a space, so the comment doesn't help me understand ... it's just
a clue that I need to go and investigate this code.

> +      buf_addr += len;
> +      total_len += len;
> +      i++;
> +    }
> +  riscv_cpu->a0 = 0;	/* No error.  */
> +}
> +
> +/* Handle semi-hosting calls.  Register a0 contains semi-hosting call number
> +   and a1 contains pointer to a buffer containing additional arguments.  */
> +static int
> +do_semihosting (SIM_CPU *cpu)
> +{
> +  struct riscv_sim_cpu *riscv_cpu = RISCV_SIM_CPU (cpu);
> +  switch (riscv_cpu->a0)
> +    {
> +    case SYS_OPEN:
> +      semihosting_open (cpu);
> +      break;
> +    case SYS_GET_CMDLINE:
> +      semihosting_get_cmdline (cpu);
> +      break;
> +    case SYS_EXIT:
> +      semihosting_exit (cpu);
> +      break;
> +    default:
> +      return -1;	/* Semi-hosting call not supported.  */
> +    }
> +
> +  return 0;
> +}
> +
>  static inline unsigned_word
>  ashiftrt (unsigned_word val, unsigned_word shift)
>  {
> @@ -623,11 +793,60 @@ execute_i (SIM_CPU *cpu, unsigned_word iw, const struct riscv_opcode *op)
>        TRACE_INSN (cpu, "fence.i;");
>        break;
>      case MATCH_EBREAK:
> -      TRACE_INSN (cpu, "ebreak;");
> -      /* GDB expects us to step over EBREAK.  */
> -      sim_engine_halt (sd, cpu, NULL, riscv_cpu->pc + 4, sim_stopped,
> -		       SIM_SIGTRAP);
> -      break;
> +	{

I think this whole semi-hosting logic should be moved out of this switch
statement into a separate function.  While reviewing I played around
with this, and ended up with this code in the switch statement:

    case MATCH_EBREAK:
      if (check_and_handle_riscv_semihosting (cpu, &pc))
	break;

      /* Not a RISC-V semihosting syscall.  */
      TRACE_INSN (cpu, "ebreak;");
      sim_engine_halt (sd, cpu, NULL, riscv_cpu->pc, sim_stopped,
		       SIM_SIGTRAP);
      break;

And then check_and_handle_riscv_semihosting was declared as:

  /* The instruction at *PC_PTR is assumed to be an ebreak instruction.
     Check if this ebreak is surrounded by the required slli/srai
     instructions that indicate this is a semihosting call.
  
     If this is a semihosting call then extract the argument data and
     perform the syscall, return 1 (true) to indicate a call was handled.  In
     this case, *PC_PTR is updated to point at the next instruction to
     execute, this will be the instruction after the srai.
  
     If no semihosting call was handled then return 0 (false).  */
  
  static int
  check_and_handle_riscv_semihosting (SIM_CPU *cpu, sim_cia *pc_ptr)
  { ... }

The implementation was what you wrote, just moved to the function.  I
think this makes it much easier, when scanning the original switch
statement, to understand what happens for an EBREAK.


> +	  /* If ebreak is at PC 0 then do not check for semi-hosting.  */
> +	  if (riscv_cpu->pc != 0)
> +	    {
> +	      /* RISC-V semi-hosting call is flagged using these three
> +		 instructions

I think this comment should explain that the random 32-bit hex numbers
in the following are the instruction encodings.  Also, I'd be very
tempted to move the SEMI_HOSTING_{SLLI,SRAI} definitions to somewhere
just after this comment.  They are only needed in this function, this
comment explains exactly what's going on ... feels like the perfect
location.

> +		 slli zero, zero, 0x1f	0x01f01013
> +		 ebreak			0x00100073
> +		 srai zero, zero, 0x7	0x40705013
> +		 Register a0 holds the system call number and a1 holds the
> +		 pointer to parameter buffer.  Do not read 4 bytes in one go
> +		 as we might read malformed 4 byte instruction.  */

This whole "Do not read 4 bytes in one go ..." business, is this just
an optimisation.  I don't see the problem with reading 4 bytes in one
go, and then checking the instruction length, and discarding if it is
wrong.  It feels like the double read is more complicated than the naive
approach....

> +	      int iw_len;
> +	      sim_cia pre_pc = riscv_cpu->pc - 4;
> +	      unsigned_word pre_iw;
> +	      pre_iw = sim_core_read_aligned_2 (cpu, pre_pc, exec_map, pre_pc);
> +	      iw_len = riscv_insn_length (pre_iw);
> +	      if (iw_len == 4)
> +		pre_iw |= ((unsigned_word) sim_core_read_aligned_2
> +			   (cpu, pre_pc, exec_map, pre_pc + 2) << 16);
> +
> +	      if (pre_iw == SEMI_HOSTING_SLLI)
> +		{
> +		  sim_cia post_pc = riscv_cpu->pc + 4;
> +		  unsigned_word post_iw;
> +		  post_iw = sim_core_read_aligned_2 (cpu, post_pc, exec_map,
> +						     post_pc);
> +		  iw_len = riscv_insn_length (post_iw);
> +		  if (iw_len == 4)
> +		    post_iw |= ((unsigned_word) sim_core_read_aligned_2
> +				(cpu, post_pc, exec_map, post_pc + 2) << 16);

Same double read that I don't understand the need for.

> +
> +		  if (post_iw == SEMI_HOSTING_SRAI)
> +		    {
> +		      TRACE_INSN (cpu, "semi-hosting a0=%lx,a1=%lx;",
> +				  riscv_cpu->a0, riscv_cpu->a1);
> +		      if (do_semihosting (cpu))
> +			{
> +			  /* Invalid semi-hosting call.  */
> +			  TRACE_INSN (cpu, "ebreak;");
> +			  sim_engine_halt (sd, cpu, NULL, riscv_cpu->pc,
> +					   sim_stopped, SIM_SIGTRAP);
> +			}
> +		      else
> +			pc = pc + 4;	/* post srai.  */
> +		      break;
> +		    }
> +		}
> +	    }
> +	  TRACE_INSN (cpu, "ebreak;");
> +	  sim_engine_halt (sd, cpu, NULL, riscv_cpu->pc, sim_stopped,
> +			   SIM_SIGTRAP);
> +	  break;
> +	}
>      case MATCH_ECALL:
>        TRACE_INSN (cpu, "ecall;");
>        riscv_cpu->a0 = sim_syscall (cpu, riscv_cpu->a7, riscv_cpu->a0,
> @@ -990,6 +1209,7 @@ execute_one (SIM_CPU *cpu, unsigned_word iw, const struct riscv_opcode *op)
>      case INSN_CLASS_A:
>        return execute_a (cpu, iw, op);
>      case INSN_CLASS_I:
> +    case INSN_CLASS_ZICSR:

Revert this change?

Thanks,
Andrew

>        return execute_i (cpu, iw, op);
>      case INSN_CLASS_M:
>      case INSN_CLASS_ZMMUL:
> -- 
> 2.25.1
  
Mike Frysinger Dec. 13, 2023, 3:43 a.m. UTC | #4
On 12 Dec 2023 17:24, Andrew Burgess wrote:
> Mike Frysinger <vapier@gentoo.org> writes:
> > On 30 Oct 2023 13:00, jaydeep.patil@imgtec.com wrote:
> >> Added support for basic semi-hosting calls OPEN, EXIT and GET_CMDLINE.
> >
> > what host environment are you implementing ?  none of the syscalls you've
> > defined match what have long been in the riscv libgloss port.  those are
> > the only ones i'd really expect at this point to be wired up.
> 
> This would be the RISC-V semihosting target, which is included in
> newlib (since 2020), but is separate to libgloss.

included where exactly ?  newlib/libc/machine/riscv/ has no syscalls afaict.
the word "semi" doesn't really appear anywhere in the codebase.

if you're referring to this commit, it's in libgloss, not newlib.
https://sourceware.org/git/?p=newlib-cygwin.git;a=commit;h=865cd30dcc2f00c81c8b3624a9f3464138cd24a5

looking at libgloss/riscv/machine/syscall.h, i see it defines the two sets
in parallel -- SYS_xxx and SEMIHOST_xxx.

the question is why does libgloss have both ?  if the riscv libgloss SYS_xxx
are only used by libgloss, and no one is implementing that (the sim hasn't),
and SEMIHOST_xxx are used by everyone, why not only use the semihost interface
and drop the SYS_xxx (and rename SEMIHOST_xxx to SYS_xxx).

where exactly is the riscv semihost standard defined such that people are
implementing the same API (source files) & ABI (the stub that processes the
ebreak calls) ?

> Where libgloss syscalls use ecall, the semihosting uses ebreak with two
> specific nop instructions (one before, one after the ebreak).

the calling convention doesn't really matter to the sim.  it can do either.

the question is whether we want to support them simultaneously, or only one
at a time.  what are other stubs doing ?

> Do you see any reason why we can't support both of these syscall
> libraries?  Potentially we could have a switch to select between them,
> but I'm inclined to say we should just support both until someone comes
> with a use-case where supporting both is a bad idea...  But maybe you
> have some deeper insights here.

supporting them both isn't a problem.  the port, as written, isn't following
our existing conventions for integrating with syscalls, but before i went down
that hole, i wanted to understand at a higher level the diff between the two.
the patch def needs a lot of work either way.
-mike
  
Andrew Burgess Dec. 18, 2023, 12:44 p.m. UTC | #5
Mike Frysinger <vapier@gentoo.org> writes:

> On 12 Dec 2023 17:24, Andrew Burgess wrote:
>> Mike Frysinger <vapier@gentoo.org> writes:
>> > On 30 Oct 2023 13:00, jaydeep.patil@imgtec.com wrote:
>> >> Added support for basic semi-hosting calls OPEN, EXIT and GET_CMDLINE.
>> >
>> > what host environment are you implementing ?  none of the syscalls you've
>> > defined match what have long been in the riscv libgloss port.  those are
>> > the only ones i'd really expect at this point to be wired up.
>> 
>> This would be the RISC-V semihosting target, which is included in
>> newlib (since 2020), but is separate to libgloss.
>
> included where exactly ?  newlib/libc/machine/riscv/ has no syscalls afaict.
> the word "semi" doesn't really appear anywhere in the codebase.

I was referring to newlib the project rather than newlib the libc
library.  Which you figured out once you did a grep ...

> if you're referring to this commit, it's in libgloss, not newlib.
> https://sourceware.org/git/?p=newlib-cygwin.git;a=commit;h=865cd30dcc2f00c81c8b3624a9f3464138cd24a5

Yeah, it's a bit "yuck".  The commit lives in the libgloss directory,
but actually adds a parallel set of build rules that create a
libsemihost.a as a separate thing from libgloss.a, hence my "separate to
libgloss" comment.  Which I'll argue is both correct and incorrect at
the same time (correct: it's a separate library, incorrect: it's within
the libgloss part of the newlib project tree).

>
> looking at libgloss/riscv/machine/syscall.h, i see it defines the two sets
> in parallel -- SYS_xxx and SEMIHOST_xxx.
>
> the question is why does libgloss have both ?  if the riscv libgloss SYS_xxx
> are only used by libgloss, and no one is implementing that (the sim hasn't),
> and SEMIHOST_xxx are used by everyone, why not only use the semihost interface
> and drop the SYS_xxx (and rename SEMIHOST_xxx to SYS_xxx).

I don't know why they are both defined at the same time.  I guess that
could be changed.

I also don't know who implements either the semihosting syscalls, or the
libgloss sys_* syscalls.

>
> where exactly is the riscv semihost standard defined such that people are
> implementing the same API (source files) & ABI (the stub that processes the
> ebreak calls) ?

This is where things get even more iffy.  As best I can tell the RISC-V
semihosting spec is here:

  https://github.com/riscv-software-src/riscv-semihosting/blob/main/riscv-semihosting-spec.adoc

But this looks far from complete to me.  The commit that added
semihosting to newlib (the project) can be found here:

  https://inbox.sourceware.org/newlib/694d497b-bc07-a3ba-2643-a7336927e9a7@embecosm.com/

Comments in that thread seem to indicate that the situation is more of a
ad hoc standard, with (maybe) openocd being a first reference
implementation?  However, it is supported by QEMU with commit
a10b9d93ecea0a8 (though I think there are additional follow up commits
relating to this topic) so there are definitely other simulators
supporting this out there.

>
>> Where libgloss syscalls use ecall, the semihosting uses ebreak with two
>> specific nop instructions (one before, one after the ebreak).
>
> the calling convention doesn't really matter to the sim.  it can do either.
>
> the question is whether we want to support them simultaneously, or only one
> at a time.  what are other stubs doing ?

I haven't looked.  I'll leave that as an exercise for Jaydeep.  I'd be
open to supporting both simultaneously as a first cut ... but I guess if
you feel strongly then I'm not against making it a requirement for this
to be switchable... it feels like that should be simple enough.

>
>> Do you see any reason why we can't support both of these syscall
>> libraries?  Potentially we could have a switch to select between them,
>> but I'm inclined to say we should just support both until someone comes
>> with a use-case where supporting both is a bad idea...  But maybe you
>> have some deeper insights here.
>
> supporting them both isn't a problem.  the port, as written, isn't following
> our existing conventions for integrating with syscalls, but before i went down
> that hole, i wanted to understand at a higher level the diff between the two.
> the patch def needs a lot of work either way.

For my own education; the ECALL handling does make use of sim_syscall.
When you say: "the port, as written, isn't following our existing
conventions for integrating with syscalls", do you mean the new
semihosting port?  Or are you saying the (pre-patch) existing code is
also not using the standard sim syscall mechanism?

The problem I see (at a quick glance) with the semihosting API is that
the syscall arguments are read from memory, while the existing syscall
mechanism seems to assume syscall arguments are in registers.  My
initial thoughts were that getting the semihosting support to use the
existing mechanism might not be straight forward.

Thanks,
Andrew
  
Mike Frysinger Dec. 18, 2023, 11:06 p.m. UTC | #6
On 18 Dec 2023 12:44, Andrew Burgess wrote:
> Mike Frysinger <vapier@gentoo.org> writes:
> > On 12 Dec 2023 17:24, Andrew Burgess wrote:
> >> Mike Frysinger <vapier@gentoo.org> writes:
> >> > On 30 Oct 2023 13:00, jaydeep.patil@imgtec.com wrote:
> >> >> Added support for basic semi-hosting calls OPEN, EXIT and GET_CMDLINE.
> >> >
> >> > what host environment are you implementing ?  none of the syscalls you've
> >> > defined match what have long been in the riscv libgloss port.  those are
> >> > the only ones i'd really expect at this point to be wired up.
> >> 
> >> This would be the RISC-V semihosting target, which is included in
> >> newlib (since 2020), but is separate to libgloss.
> >
> > included where exactly ?  newlib/libc/machine/riscv/ has no syscalls afaict.
> > the word "semi" doesn't really appear anywhere in the codebase.
> 
> I was referring to newlib the project rather than newlib the libc
> library.  Which you figured out once you did a grep ...

some ports put their syscalls under newlib/.  it was more common in the past,
but the point of libgloss was to pull such things out.  it's why i was precise
when i said "libgloss" and not generically "newlib".  new ports should not be
putting things like syscalls into newlib, they should be in libgloss.

the exact path is important because we automate importing of syscalls from
libgloss & newlib (see sim/common/gennltvals.py), and handcoding the table in
the sim is not what we want at all.

> > if you're referring to this commit, it's in libgloss, not newlib.
> > https://sourceware.org/git/?p=newlib-cygwin.git;a=commit;h=865cd30dcc2f00c81c8b3624a9f3464138cd24a5
> 
> Yeah, it's a bit "yuck".  The commit lives in the libgloss directory,
> but actually adds a parallel set of build rules that create a
> libsemihost.a as a separate thing from libgloss.a, hence my "separate to
> libgloss" comment.  Which I'll argue is both correct and incorrect at
> the same time (correct: it's a separate library, incorrect: it's within
> the libgloss part of the newlib project tree).

the output of the libgloss/ tree is not standard by any means.  while riscv
creates a literal "libgloss.a" file, that is not the most common behavior.
considering the mess and complete lack of agreement between gcc specs and
the libgloss project outputs, better to be precise here.

> > where exactly is the riscv semihost standard defined such that people are
> > implementing the same API (source files) & ABI (the stub that processes the
> > ebreak calls) ?
> 
> This is where things get even more iffy.  As best I can tell the RISC-V
> semihosting spec is here:
> 
>   https://github.com/riscv-software-src/riscv-semihosting/blob/main/riscv-semihosting-spec.adoc
> 
> But this looks far from complete to me.  The commit that added
> semihosting to newlib (the project) can be found here:
> 
>   https://inbox.sourceware.org/newlib/694d497b-bc07-a3ba-2643-a7336927e9a7@embecosm.com/
> 
> Comments in that thread seem to indicate that the situation is more of a
> ad hoc standard, with (maybe) openocd being a first reference
> implementation?  However, it is supported by QEMU with commit
> a10b9d93ecea0a8 (though I think there are additional follow up commits
> relating to this topic) so there are definitely other simulators
> supporting this out there.

the rsicv-semihosting spec is real rough.  ignoring the opening paragraph:
> This is a Discussion Document: Assume everything can change. This document is not complete yet and was created only for the purpose of conversation outside of the document. See http://riscv.org/spec-state for more details.

it merely defines part of the calling standard.  basically how to trigger the
semihost.  it doesn't cover the actual calling convention (where do extra args
go), the syscall ABI (numbers/arguments/etc...), or how errors are passed back
(in the return register ?  sep location ?  what are the error numbers ?).  or
how are non-basic types defined & passed (e.g. is off_t always 32-bit ?  how
are 64-bit values passed on a 32-bit system ?).  padding ?  alignment ?

if there's disagreements/bugs in implementations, how do we decide which is the
"correct" behavior ?  if someone wants to add another syscall, who decides ?

the libgloss port seems like "written to work against whatever reference
implementation they had available" rather than to/with a spec.

the qemu commit also lacks details beyond the bare min spec you linked.  it
makes it sound like riscv is just reusing the semihost syscall numbers that
arm already has ?  is it just copying & pasting the entire arm ABI then ?
if that's the case, it screams even more "don't put it in riscv/".  we have
arm & aarch64 ports which could also benefit.

to compare, QEMU links to this page for ARM:
https://github.com/ARM-software/abi-aa/blob/main/semihosting/semihosting.rst
that's an actual spec -- 1500 lines of detailed content.

> >> Where libgloss syscalls use ecall, the semihosting uses ebreak with two
> >> specific nop instructions (one before, one after the ebreak).
> >
> > the calling convention doesn't really matter to the sim.  it can do either.
> >
> > the question is whether we want to support them simultaneously, or only one
> > at a time.  what are other stubs doing ?
> 
> I haven't looked.  I'll leave that as an exercise for Jaydeep.  I'd be
> open to supporting both simultaneously as a first cut ... but I guess if
> you feel strongly then I'm not against making it a requirement for this
> to be switchable... it feels like that should be simple enough.

if they're both actually developed/used, i'm fine with both.  i'm not keen on
supporting the treadmill of "new group wants new feature, designs new thing in
isolation, adds another syscall ABI".  we're left holding the bag of dead ABIs
no one cares about.  so when someone shows up with an under-specified new thing,
i'm automatically suspicious.

> >> Do you see any reason why we can't support both of these syscall
> >> libraries?  Potentially we could have a switch to select between them,
> >> but I'm inclined to say we should just support both until someone comes
> >> with a use-case where supporting both is a bad idea...  But maybe you
> >> have some deeper insights here.
> >
> > supporting them both isn't a problem.  the port, as written, isn't following
> > our existing conventions for integrating with syscalls, but before i went down
> > that hole, i wanted to understand at a higher level the diff between the two.
> > the patch def needs a lot of work either way.
> 
> For my own education; the ECALL handling does make use of sim_syscall.
> When you say: "the port, as written, isn't following our existing
> conventions for integrating with syscalls", do you mean the new
> semihosting port?  Or are you saying the (pre-patch) existing code is
> also not using the standard sim syscall mechanism?
> 
> The problem I see (at a quick glance) with the semihosting API is that
> the syscall arguments are read from memory, while the existing syscall
> mechanism seems to assume syscall arguments are in registers.  My
> initial thoughts were that getting the semihosting support to use the
> existing mechanism might not be straight forward.

i'm referring to the new semihost code in this thread.

the sim_syscall is not perfect by any means, but i'd like to focus on improving
the existing core rather than each port & syscall ABI doing it themselves.  we
end up with each port working diff depending on the host OS, and having to copy
and paste fixes/improvements between them.  i'm a bit guilty of this myself with
the Blackfin port, but in my defense, i wasn't as familiar with the codebase.

sim_syscall itself is a convenience if you have arguments in registers.  if they
are in memory, then you can unpack before calling sim_syscall_multi yourself.

other problems at a glance (as i haven't gone too deep yet):
* isn't respecting --environment framework (existing code needs fixing too)
* hardcoding syscall numbers instead of using our framework to extract
* no error handling afaict
* casting from target addresses/ints to host uintptr_t looks fundamentally wrong
-mike
  
Jaydeep Patil Dec. 19, 2023, 6:13 a.m. UTC | #7
Mike Frysinger <vapier@gentoo.org> writes:

> what host environment are you implementing ?  none of the syscalls you've defined match what have long been in the riscv libgloss port.  those are the only ones i'd really expect at this point to be wired up.

The semi-hosting calls generated by newlib (--specs=semihost.specs option) and picolibc libraries.

> you're missing sim/ tests.  gdb tests are great, but not sufficient.

I have added c-ext.s in the sim tests.

> seems unrelated to this patch

Removed it in patch v3

Regards,
Jaydeep
  
Mike Frysinger Dec. 20, 2023, 1:45 a.m. UTC | #8
On 19 Dec 2023 06:13, Jaydeep Patil wrote:
> Mike Frysinger <vapier@gentoo.org> writes:
> > what host environment are you implementing ?  none of the syscalls you've defined match what have long been in the riscv libgloss port.  those are the only ones i'd really expect at this point to be wired up.
> 
> The semi-hosting calls generated by newlib (--specs=semihost.specs option) and picolibc libraries.

what specification are you implementing ?  newlib is an implementation, not a specification.
-mike
  
Jaydeep Patil Dec. 20, 2023, 8:52 a.m. UTC | #9
Mike Frysinger <vapier@gentoo.org> writes:

> what specification are you implementing ?  newlib is an implementation, not a specification.

I am referring to https://github.com/riscv-software-src/riscv-semihosting/blob/main/riscv-semihosting-spec.adoc. This is still a discussion document and not ratified. The document says "RISC-V semihosting borrows from the design of other publicly available and open source semihosting mechanisms to minimize the development effort required. At some stage, the document referred to "Semihosting for AArch32 and AArch64" (https://github.com/riscv-software-src/riscv-semihosting/blob/de2a56ff73439b1d955e361eb4d35c0e8e453a42/riscv-semihosting-spec.adoc). Looking at the implementation of newlib and picolib, they are following the same specification.

Regards,
Jaydeep
  

Patch

diff --git a/gdb/testsuite/gdb.arch/riscv-exit-getcmd.c b/gdb/testsuite/gdb.arch/riscv-exit-getcmd.c
new file mode 100644
index 00000000000..02a97da8643
--- /dev/null
+++ b/gdb/testsuite/gdb.arch/riscv-exit-getcmd.c
@@ -0,0 +1,26 @@ 
+/* This file is part of GDB, the GNU debugger.
+
+   Copyright 2023 Free Software Foundation, Inc.
+
+   This program is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3 of the License, or
+   (at your option) any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program.  If not, see <http://www.gnu.org/licenses/>.  */
+
+/* Test basic semi-hosting calls SYS_GET_CMDLINE and SYS_EXIT.  */
+
+int
+main (int argc, char **argv)
+{
+  if (argc != 4)
+    return 1;
+  return 0;
+}
diff --git a/gdb/testsuite/gdb.arch/riscv-exit-getcmd.exp b/gdb/testsuite/gdb.arch/riscv-exit-getcmd.exp
new file mode 100644
index 00000000000..7f5670200c2
--- /dev/null
+++ b/gdb/testsuite/gdb.arch/riscv-exit-getcmd.exp
@@ -0,0 +1,27 @@ 
+# Copyright 2023 Free Software Foundation, Inc.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+
+# Test basic semi-hosting calls SYS_GET_CMDLINE and SYS_EXIT.
+
+require {istarget "riscv*-*-*"}
+
+standard_testfile
+
+if { [prepare_for_testing "failed to prepare" ${testfile} ${srcfile} \
+	  {debug quiet}] } {
+    return -1
+}
+
+gdb_test "run 1 2 3" ".*Inferior.*process.*exited normally.*"
diff --git a/sim/riscv/riscv-sim.h b/sim/riscv/riscv-sim.h
index 1bc9aa12156..bb296fa6e62 100644
--- a/sim/riscv/riscv-sim.h
+++ b/sim/riscv/riscv-sim.h
@@ -75,4 +75,36 @@  extern void initialize_env (SIM_DESC, const char * const *argv,
 
 #define RISCV_XLEN(cpu) MACH_WORD_BITSIZE (CPU_MACH (cpu))
 
+#define APPLICATION_EXIT 0x20026
+
+#define SYS_OPEN 0x01
+#define SYS_GET_CMDLINE 0x15
+#define SYS_EXIT 0x18
+
+#define GDB_O_RDONLY 0x000
+#define GDB_O_WRONLY 0x001
+#define GDB_O_RDWR 0x002
+#define GDB_O_APPEND 0x008
+#define GDB_O_CREAT 0x200
+#define GDB_O_TRUNC 0x400
+#define GDB_O_BINARY 0
+
+#define SEMI_HOSTING_SLLI 0x01f01013
+#define SEMI_HOSTING_SRAI 0x40705013
+
+static int gdb_open_modeflags[12] = {
+  GDB_O_RDONLY,
+  GDB_O_RDONLY | GDB_O_BINARY,
+  GDB_O_RDWR,
+  GDB_O_RDWR | GDB_O_BINARY,
+  GDB_O_WRONLY | GDB_O_CREAT | GDB_O_TRUNC,
+  GDB_O_WRONLY | GDB_O_CREAT | GDB_O_TRUNC | GDB_O_BINARY,
+  GDB_O_RDWR | GDB_O_CREAT | GDB_O_TRUNC,
+  GDB_O_RDWR | GDB_O_CREAT | GDB_O_TRUNC | GDB_O_BINARY,
+  GDB_O_WRONLY | GDB_O_CREAT | GDB_O_APPEND,
+  GDB_O_WRONLY | GDB_O_CREAT | GDB_O_APPEND | GDB_O_BINARY,
+  GDB_O_RDWR | GDB_O_CREAT | GDB_O_APPEND,
+  GDB_O_RDWR | GDB_O_CREAT | GDB_O_APPEND | GDB_O_BINARY
+};
+
 #endif
diff --git a/sim/riscv/sim-main.c b/sim/riscv/sim-main.c
index afdfcf50656..8e08eb9fc4e 100644
--- a/sim/riscv/sim-main.c
+++ b/sim/riscv/sim-main.c
@@ -136,6 +136,176 @@  store_csr (SIM_CPU *cpu, const char *name, int csr, unsigned_word *reg,
   TRACE_REGISTER (cpu, "wrote CSR %s = %#" PRIxTW, name, val);
 }
 
+/* Read 4 or 8 bytes of data from the core memory.  ADDR and (INDEX * XLEN)
+   form the base address.  */
+static uintptr_t
+get_core_data (SIM_CPU *cpu, unsigned_word addr, unsigned_word index)
+{
+  uintptr_t param;
+  int xlen = RISCV_XLEN (cpu);
+  struct riscv_sim_cpu *riscv_cpu = RISCV_SIM_CPU (cpu);
+
+  if (xlen == 64)
+    param = sim_core_read_unaligned_8 (cpu, riscv_cpu->pc, read_map,
+		addr + (index * 8));
+  else
+    param = sim_core_read_unaligned_4 (cpu, riscv_cpu->pc, read_map,
+		addr + (index * 4));
+
+  return param;
+}
+
+/* Write string in HOST_BUF at CORE_ADDR.  The length of string is LEN.  */
+static void
+set_core_string (SIM_CPU *cpu, unsigned_word core_addr, char *host_buf,
+		 int len)
+{
+  struct riscv_sim_cpu *riscv_cpu = RISCV_SIM_CPU (cpu);
+  for (int i = 0; i < len; i++)
+    {
+      sim_core_write_unaligned_1 (cpu, riscv_cpu->pc, write_map,
+				  core_addr + i, host_buf[i]);
+    }
+}
+
+/* Read string of length LEN at address ADDR.  */
+static char *
+get_core_string_with_len (SIM_CPU *cpu, unsigned_word addr,
+			  unsigned_word len)
+{
+  struct riscv_sim_cpu *riscv_cpu = RISCV_SIM_CPU (cpu);
+  char * str;
+  str = (char *) malloc (len + 1);
+
+  for (int i = 0; i < len; i++)
+    {
+      str[i] = sim_core_read_unaligned_1 (cpu, riscv_cpu->pc, read_map,
+					  addr + i);
+    }
+  str[len] = 0;
+
+  return str;
+}
+
+/* SYS_OPEN
+   Register a1 points to a buffer containing:
+   Index 0: Pointer: Address of the file name string.
+   Index 1: Integer: File open flags.
+   Index 2: Integer: Length of the file name string.
+   File handle is returned through register a0.  */
+static void
+semihosting_open (SIM_CPU *cpu)
+{
+  uintptr_t fname_addr;
+  uintptr_t flags;
+  uintptr_t fname_len;
+  char *name;
+  struct riscv_sim_cpu *riscv_cpu = RISCV_SIM_CPU (cpu);
+
+  fname_addr = (uintptr_t) get_core_data (cpu, riscv_cpu->a1, 0);
+  flags = (uintptr_t) get_core_data (cpu, riscv_cpu->a1, 1);
+  fname_len = (uintptr_t) get_core_data (cpu, riscv_cpu->a1, 2);
+
+  if (fname_len <= 0)
+    {
+      riscv_cpu->a0 = -1;
+      return;
+    }
+
+  name = get_core_string_with_len (cpu, fname_addr, fname_len);
+  riscv_cpu->a0 = sim_io_open (CPU_STATE (cpu), name,
+			       gdb_open_modeflags[flags]);
+  free (name);
+}
+
+/* SYS_EXIT.  In case of RV32, register a1 holds the application stop reason
+   and the exit code is decided based on it.  Otherwise, register a1 points to
+   a buffer containing:
+   Index 0: Integer: Application stop reason (ignored)
+   Index 1: Integer: Exit code.  */
+static void
+semihosting_exit (SIM_CPU *cpu)
+{
+  uintptr_t app_code, exit_code;
+  SIM_DESC sd = CPU_STATE (cpu);
+  struct riscv_sim_cpu *riscv_cpu = RISCV_SIM_CPU (cpu);
+  if (RISCV_XLEN (cpu) == 32)
+    {
+      app_code = riscv_cpu->a1;
+      if (app_code == APPLICATION_EXIT)
+	exit_code = 0;
+      else
+	exit_code = 1;
+    }
+  else
+    exit_code = (uintptr_t) get_core_data (cpu, riscv_cpu->a1, 1);
+  riscv_cpu->a0 = exit_code;
+  sim_engine_halt (sd, cpu, NULL, riscv_cpu->pc, sim_exited, exit_code);
+}
+
+/* SYS_GET_CMDLINE.  Write command line arguments to a buffer.  Arguments
+   passed to "run" command are stored in STATE_PROG_ARGV member of SIM_DESC.
+   Register a1 points to a buffer containing:
+   Index 0: Pointer: Buffer to store command line arguments
+   Index 1: Integer: Maximum length of the buffer.  */
+static void
+semihosting_get_cmdline (SIM_CPU *cpu)
+{
+  int i = 0, len = 0, total_len = 0;
+  uintptr_t buf_addr, max_buf_len;
+  SIM_DESC sd = CPU_STATE (cpu);
+  char *space = " ";
+  struct riscv_sim_cpu *riscv_cpu = RISCV_SIM_CPU (cpu);
+
+  char **prog_argv = STATE_PROG_ARGV (sd);
+  if (prog_argv == NULL)
+    {
+      riscv_cpu->a0 = 1; /* Return non-zero to indicate error.  */
+      return;
+    }
+
+  buf_addr = (uintptr_t) get_core_data (cpu, riscv_cpu->a1, 0);
+  max_buf_len = (uintptr_t) get_core_data (cpu, riscv_cpu->a1, 1);
+
+  while (prog_argv[i])
+    {
+      len = strlen (prog_argv[i]);
+      if ((total_len + len) > max_buf_len)
+	break;
+      set_core_string (cpu, buf_addr, prog_argv[i], len);
+      set_core_string (cpu, buf_addr + len, space, 1);
+      len++;	/* Terminate it with space.  */
+      buf_addr += len;
+      total_len += len;
+      i++;
+    }
+  riscv_cpu->a0 = 0;	/* No error.  */
+}
+
+/* Handle semi-hosting calls.  Register a0 contains semi-hosting call number
+   and a1 contains pointer to a buffer containing additional arguments.  */
+static int
+do_semihosting (SIM_CPU *cpu)
+{
+  struct riscv_sim_cpu *riscv_cpu = RISCV_SIM_CPU (cpu);
+  switch (riscv_cpu->a0)
+    {
+    case SYS_OPEN:
+      semihosting_open (cpu);
+      break;
+    case SYS_GET_CMDLINE:
+      semihosting_get_cmdline (cpu);
+      break;
+    case SYS_EXIT:
+      semihosting_exit (cpu);
+      break;
+    default:
+      return -1;	/* Semi-hosting call not supported.  */
+    }
+
+  return 0;
+}
+
 static inline unsigned_word
 ashiftrt (unsigned_word val, unsigned_word shift)
 {
@@ -623,11 +793,60 @@  execute_i (SIM_CPU *cpu, unsigned_word iw, const struct riscv_opcode *op)
       TRACE_INSN (cpu, "fence.i;");
       break;
     case MATCH_EBREAK:
-      TRACE_INSN (cpu, "ebreak;");
-      /* GDB expects us to step over EBREAK.  */
-      sim_engine_halt (sd, cpu, NULL, riscv_cpu->pc + 4, sim_stopped,
-		       SIM_SIGTRAP);
-      break;
+	{
+	  /* If ebreak is at PC 0 then do not check for semi-hosting.  */
+	  if (riscv_cpu->pc != 0)
+	    {
+	      /* RISC-V semi-hosting call is flagged using these three
+		 instructions
+		 slli zero, zero, 0x1f	0x01f01013
+		 ebreak			0x00100073
+		 srai zero, zero, 0x7	0x40705013
+		 Register a0 holds the system call number and a1 holds the
+		 pointer to parameter buffer.  Do not read 4 bytes in one go
+		 as we might read malformed 4 byte instruction.  */
+	      int iw_len;
+	      sim_cia pre_pc = riscv_cpu->pc - 4;
+	      unsigned_word pre_iw;
+	      pre_iw = sim_core_read_aligned_2 (cpu, pre_pc, exec_map, pre_pc);
+	      iw_len = riscv_insn_length (pre_iw);
+	      if (iw_len == 4)
+		pre_iw |= ((unsigned_word) sim_core_read_aligned_2
+			   (cpu, pre_pc, exec_map, pre_pc + 2) << 16);
+
+	      if (pre_iw == SEMI_HOSTING_SLLI)
+		{
+		  sim_cia post_pc = riscv_cpu->pc + 4;
+		  unsigned_word post_iw;
+		  post_iw = sim_core_read_aligned_2 (cpu, post_pc, exec_map,
+						     post_pc);
+		  iw_len = riscv_insn_length (post_iw);
+		  if (iw_len == 4)
+		    post_iw |= ((unsigned_word) sim_core_read_aligned_2
+				(cpu, post_pc, exec_map, post_pc + 2) << 16);
+
+		  if (post_iw == SEMI_HOSTING_SRAI)
+		    {
+		      TRACE_INSN (cpu, "semi-hosting a0=%lx,a1=%lx;",
+				  riscv_cpu->a0, riscv_cpu->a1);
+		      if (do_semihosting (cpu))
+			{
+			  /* Invalid semi-hosting call.  */
+			  TRACE_INSN (cpu, "ebreak;");
+			  sim_engine_halt (sd, cpu, NULL, riscv_cpu->pc,
+					   sim_stopped, SIM_SIGTRAP);
+			}
+		      else
+			pc = pc + 4;	/* post srai.  */
+		      break;
+		    }
+		}
+	    }
+	  TRACE_INSN (cpu, "ebreak;");
+	  sim_engine_halt (sd, cpu, NULL, riscv_cpu->pc, sim_stopped,
+			   SIM_SIGTRAP);
+	  break;
+	}
     case MATCH_ECALL:
       TRACE_INSN (cpu, "ecall;");
       riscv_cpu->a0 = sim_syscall (cpu, riscv_cpu->a7, riscv_cpu->a0,
@@ -990,6 +1209,7 @@  execute_one (SIM_CPU *cpu, unsigned_word iw, const struct riscv_opcode *op)
     case INSN_CLASS_A:
       return execute_a (cpu, iw, op);
     case INSN_CLASS_I:
+    case INSN_CLASS_ZICSR:
       return execute_i (cpu, iw, op);
     case INSN_CLASS_M:
     case INSN_CLASS_ZMMUL: