From patchwork Thu Jun 18 16:40:44 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nick Clifton X-Patchwork-Id: 7242 Received: (qmail 48202 invoked by alias); 18 Jun 2015 16:40:51 -0000 Mailing-List: contact gdb-patches-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: gdb-patches-owner@sourceware.org Delivered-To: mailing list gdb-patches@sourceware.org Received: (qmail 48182 invoked by uid 89); 18 Jun 2015 16:40:50 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.2 required=5.0 tests=AWL, BAYES_20, KAM_LAZY_DOMAIN_SECURITY, RP_MATCHES_RCVD, SPF_HELO_PASS autolearn=no version=3.3.2 X-Spam-User: qpsmtpd, 2 recipients X-HELO: mx1.redhat.com Received: from mx1.redhat.com (HELO mx1.redhat.com) (209.132.183.28) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES256-GCM-SHA384 encrypted) ESMTPS; Thu, 18 Jun 2015 16:40:48 +0000 Received: from int-mx13.intmail.prod.int.phx2.redhat.com (int-mx13.intmail.prod.int.phx2.redhat.com [10.5.11.26]) by mx1.redhat.com (Postfix) with ESMTPS id 4B074C12D6; Thu, 18 Jun 2015 16:40:47 +0000 (UTC) Received: from littlehelper.redhat.com (vpn1-5-36.ams2.redhat.com [10.36.5.36]) by int-mx13.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id t5IGeiOb011726 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Thu, 18 Jun 2015 12:40:46 -0400 From: Nick Clifton To: binutils@sourceware.org Cc: gdb-patches@sourceware.org Subject: RFC: Prevent disassembly beyond symbolic boundaries Date: Thu, 18 Jun 2015 17:40:44 +0100 Message-ID: <87lhfhynoz.fsf@redhat.com> MIME-Version: 1.0 Hi Guys, Currently objdump will disassemble beyond a symbolic boundary if it needs extra bytes to decode an instruction. For example (with x86): .file "foo.c" .text .globl foo .type foo, @function foo: .byte 0x24 .byte 0x2f .byte 0x83 .size foo, .-foo .globl bar .type bar, @function bar: .byte 0x0f .byte 0xba .byte 0xe2 .byte 0x03 .size bar, .-bar This will disassemble as: 0000000000000000 : 0: 24 2f and $0x2f,%al 2: 83 0f ba orl $0xffffffba,(%rdi) 0000000000000003 : 3: 0f ba e2 03 bt $0x3,%edx Note how the instruction decoded at address 0x2 has stolen two bytes from "foo", but these bytes are also decoded (correctly this time) as part of the first instruction of foo. I have a patch (attached) which changes this behaviour, so that the disassembly would be: 0: 24 2f and $0x2f,%al 2: 83 .byte 0x83 00000003 : 3: 0f ba e2 03 bt $0x3,%edx The patch works by adding an extra field to the end of the disassemble_info structure, setting it inside objdump's disassemble_bytes function and then checking it in the opcode library's buffer_read_memory function. This means that other users of the opcodes library will not be affected by the change. The patch makes sure to only enable this feature when disassembling code sections, data sections are unaffected. I have omitted adding a new test for the feature since the gas/i386/x86_64-opcode-inval test already covers this. What do people think ? To me this seems like a good idea, but I willing to consider alternative suggestions if people have them. Cheers Nick PS. The patch only affects objdump. So if you disassemble a binary using GDB for example, the old behaviour will still be seen. Changing GDB's behaviour is possible, although it would be quite a big job as there are lots of different places where the disassembler part of the opcodes library is called and memory is read. diff --git a/include/dis-asm.h b/include/dis-asm.h index ad060ee..de17030 100644 --- a/include/dis-asm.h +++ b/include/dis-asm.h @@ -212,6 +212,14 @@ typedef struct disassemble_info /* Command line options specific to the target disassembler. */ char * disassembler_options; + /* If non-zero then try not disassemble beyond this address, even if + there are values left in the buffer. This address is the address + of the nearest symbol forwards from the start of the disassembly, + and it is assumed that it lies on the boundary between instructions. + If an instruction spans this address then this is an error in the + file being disassembled. */ + bfd_vma stop_vma; + } disassemble_info; diff --git a/binutils/objdump.c b/binutils/objdump.c index f51b6f5..5165fff 100644 --- a/binutils/objdump.c +++ b/binutils/objdump.c @@ -1685,7 +1685,16 @@ disassemble_bytes (struct disassemble_info * inf, } } + if ((section->flags & (SEC_CODE | SEC_HAS_CONTENTS)) + == (SEC_CODE | SEC_HAS_CONTENTS)) + /* Set a stop_vma so that the disassembler will not read + beyond the next symbol. We assume that symbols appear on + the boundaries between instructions. We only do this when + disassembling code of course. */ + inf->stop_vma = section->vma + stop_offset; + octets = (*disassemble_fn) (section->vma + addr_offset, inf); + inf->stop_vma = 0; inf->fprintf_func = (fprintf_ftype) fprintf; inf->stream = stdout; if (insn_width == 0 && inf->bytes_per_line != 0) diff --git a/opcodes/dis-buf.c b/opcodes/dis-buf.c index cc0e3ad..7c5d9ad 100644 --- a/opcodes/dis-buf.c +++ b/opcodes/dis-buf.c @@ -38,7 +38,9 @@ buffer_read_memory (bfd_vma memaddr, if (memaddr < info->buffer_vma || memaddr - info->buffer_vma > max_addr_offset - || memaddr - info->buffer_vma + end_addr_offset > max_addr_offset) + || memaddr - info->buffer_vma + end_addr_offset > max_addr_offset + || (info->stop_vma && (memaddr >= info->stop_vma + || memaddr + end_addr_offset > info->stop_vma))) /* Out of bounds. Use EIO because GDB uses it. */ return EIO; memcpy (myaddr, info->buffer + octets, length); diff --git a/opcodes/bfin-dis.c b/opcodes/bfin-dis.c index cf66b79..bf2052e 100644 --- a/opcodes/bfin-dis.c +++ b/opcodes/bfin-dis.c @@ -4664,7 +4664,7 @@ _print_insn_bfin (bfd_vma pc, disassemble_info *outf) return -1; priv->iw0 = iw0; - if ((iw0 & 0xc000) == 0xc000) + if (((iw0 & 0xc000) == 0xc000) && ((iw0 & 0xff00) != 0xf800)) { /* 32-bit insn. */ if (ifetch (pc + 2, outf, &iw1)) diff --git a/opcodes/mcore-dis.c b/opcodes/mcore-dis.c index dc62099..536f79b 100644 --- a/opcodes/mcore-dis.c +++ b/opcodes/mcore-dis.c @@ -88,9 +88,8 @@ static const char *crname[] = { static const unsigned isiz[] = { 2, 0, 1, 0 }; int -print_insn_mcore (memaddr, info) - bfd_vma memaddr; - struct disassemble_info *info; +print_insn_mcore (bfd_vma memaddr, + struct disassemble_info *info) { unsigned char ibytes[4]; fprintf_ftype print_func = info->fprintf_func; @@ -233,6 +232,9 @@ print_insn_mcore (memaddr, info) val = (memaddr + 2 + ((inst & 0xFF) << 2)) & 0xFFFFFFFC; + /* We are not reading an instruction, so allow + reads to extend beyond the next symbol. */ + info->stop_vma = 0; status = info->read_memory_func (val, ibytes, 4, info); if (status != 0) { @@ -263,6 +265,9 @@ print_insn_mcore (memaddr, info) val = (memaddr + 2 + ((inst & 0xFF) << 2)) & 0xFFFFFFFC; + /* We are not reading an instruction, so allow + reads to extend beyond the next symbol. */ + info->stop_vma = 0; status = info->read_memory_func (val, ibytes, 4, info); if (status != 0) { diff --git a/opcodes/sh-dis.c b/opcodes/sh-dis.c index 74de9f6..a3f645d 100644 --- a/opcodes/sh-dis.c +++ b/opcodes/sh-dis.c @@ -905,6 +905,8 @@ print_insn_sh (bfd_vma memaddr, struct disassemble_info *info) size = 2; else size = 4; + /* Not reading an instruction - disable stop_vma. */ + info->stop_vma = 0; status = info->read_memory_func (disp_pc_addr, bytes, size, info); if (status == 0) { diff --git a/opcodes/tic6x-dis.c b/opcodes/tic6x-dis.c index e027340..498ffe0 100644 --- a/opcodes/tic6x-dis.c +++ b/opcodes/tic6x-dis.c @@ -249,6 +249,9 @@ print_insn_tic6x (bfd_vma addr, struct disassemble_info *info) fp_offset = addr & 0x1f; fp_addr = addr - fp_offset; + /* Read in a block of instructions. Since there might be a + symbol in the middle of this block, disable stop_vma. */ + info->stop_vma = 0; status = info->read_memory_func (fp_addr, fp, 32, info); if (status) { diff --git a/opcodes/vax-dis.c b/opcodes/vax-dis.c index a7a1ccb..da4ba7c 100644 --- a/opcodes/vax-dis.c +++ b/opcodes/vax-dis.c @@ -402,7 +402,8 @@ print_insn_vax (bfd_vma memaddr, disassemble_info *info) argp = NULL; /* Check if the info buffer has more than one byte left since the last opcode might be a single byte with no argument data. */ - if (info->buffer_length - (memaddr - info->buffer_vma) > 1) + if (info->buffer_length - (memaddr - info->buffer_vma) > 1 + && (info->stop_vma == 0 || memaddr < (info->stop_vma - 1))) { FETCH_DATA (info, buffer + 2); } diff --git a/gas/testsuite/gas/arm/backslash-at.d b/gas/testsuite/gas/arm/backslash-at.d index 3397573..49e815a 100644 --- a/gas/testsuite/gas/arm/backslash-at.d +++ b/gas/testsuite/gas/arm/backslash-at.d @@ -5,13 +5,13 @@ Disassembly of section .text: 0+000 <.*>.*(615c|5c61).* -0+002 e3a00000 mov r0, #0 -0+006 e3a00000 mov r0, #0 -0+00a e3a00000 mov r0, #0 -0+00e e3a00001 mov r0, #1 -0+012 e3a00001 mov r0, #1 -0+016 e3a00001 mov r0, #1 -0+01a e3a00002 mov r0, #2 -0+01e e3a00002 mov r0, #2 -0+022 e3a00002 mov r0, #2 +0+004 e3a00000 mov r0, #0 +0+008 e3a00000 mov r0, #0 +0+00c e3a00000 mov r0, #0 +0+010 e3a00001 mov r0, #1 +0+014 e3a00001 mov r0, #1 +0+018 e3a00001 mov r0, #1 +0+01c e3a00002 mov r0, #2 +0+020 e3a00002 mov r0, #2 +0+024 e3a00002 mov r0, #2 #... diff --git a/gas/testsuite/gas/arm/backslash-at.s b/gas/testsuite/gas/arm/backslash-at.s index 4975aea..1357354 100644 --- a/gas/testsuite/gas/arm/backslash-at.s +++ b/gas/testsuite/gas/arm/backslash-at.s @@ -6,9 +6,10 @@ mov r0, #\@ @comment .endm -.byte '\\ -.byte '\a - + .byte '\\ + .byte '\a + .byte 0 + .byte 0 foo: bar bar diff --git a/gas/testsuite/gas/i386/ilp32/x86-64-opcode-inval-intel.d b/gas/testsuite/gas/i386/ilp32/x86-64-opcode-inval-intel.d index cd503eb..cefd9fd 100644 --- a/gas/testsuite/gas/i386/ilp32/x86-64-opcode-inval-intel.d +++ b/gas/testsuite/gas/i386/ilp32/x86-64-opcode-inval-intel.d @@ -12,26 +12,26 @@ Disassembly of section .text: 0+1 : [ ]*[a-f0-9]+: d5 \(bad\) -[ ]*[a-f0-9]+: 0a d5 or dl,ch +[ ]*[a-f0-9]+: 0a .byte 0xa 0+3 : [ ]*[a-f0-9]+: d5 \(bad\) -[ ]*[a-f0-9]+: 02 d4 add dl,ah +[ ]*[a-f0-9]+: 02 .byte 0x2 0+5 : [ ]*[a-f0-9]+: d4 \(bad\) -[ ]*[a-f0-9]+: 0a d4 or dl,ah +[ ]*[a-f0-9]+: 0a .byte 0xa 0+7 : [ ]*[a-f0-9]+: d4 \(bad\) -[ ]*[a-f0-9]+: 02 3f add bh,BYTE PTR \[rdi\] +[ ]*[a-f0-9]+: 02 .byte 0x2 0+9 : [ ]*[a-f0-9]+: 3f \(bad\) 0+a : -[ ]*[a-f0-9]+: 62 \(bad\) -[ ]*[a-f0-9]+: 10 27 adc BYTE PTR \[rdi\],ah +[ ]*[a-f0-9]+: 62 .byte 0x62 +[ ]*[a-f0-9]+: 10 .byte 0x10 0+c : [ ]*[a-f0-9]+: 27 \(bad\) diff --git a/gas/testsuite/gas/i386/ilp32/x86-64-opcode-inval.d b/gas/testsuite/gas/i386/ilp32/x86-64-opcode-inval.d index 35f8137..21ac5de 100644 --- a/gas/testsuite/gas/i386/ilp32/x86-64-opcode-inval.d +++ b/gas/testsuite/gas/i386/ilp32/x86-64-opcode-inval.d @@ -12,26 +12,26 @@ Disassembly of section .text: 0+1 : [ ]*[a-f0-9]+: d5 \(bad\) -[ ]*[a-f0-9]+: 0a d5 or %ch,%dl +[ ]*[a-f0-9]+: 0a .byte 0xa 0+3 : [ ]*[a-f0-9]+: d5 \(bad\) -[ ]*[a-f0-9]+: 02 d4 add %ah,%dl +[ ]*[a-f0-9]+: 02 .byte 0x2 0+5 : [ ]*[a-f0-9]+: d4 \(bad\) -[ ]*[a-f0-9]+: 0a d4 or %ah,%dl +[ ]*[a-f0-9]+: 0a .byte 0xa 0+7 : [ ]*[a-f0-9]+: d4 \(bad\) -[ ]*[a-f0-9]+: 02 3f add \(%rdi\),%bh +[ ]*[a-f0-9]+: 02 .byte 0x2 0+9 : [ ]*[a-f0-9]+: 3f \(bad\) 0+a : -[ ]*[a-f0-9]+: 62 \(bad\) -[ ]*[a-f0-9]+: 10 27 adc %ah,\(%rdi\) +[ ]*[a-f0-9]+: 62 .byte 0x62 +[ ]*[a-f0-9]+: 10 .byte 0x10 0+c : [ ]*[a-f0-9]+: 27 \(bad\) diff --git a/gas/testsuite/gas/i386/x86-64-opcode-inval-intel.d b/gas/testsuite/gas/i386/x86-64-opcode-inval-intel.d index 7cf0c27..df7c8bd 100644 --- a/gas/testsuite/gas/i386/x86-64-opcode-inval-intel.d +++ b/gas/testsuite/gas/i386/x86-64-opcode-inval-intel.d @@ -12,26 +12,26 @@ Disassembly of section .text: 0+1 : [ ]*[a-f0-9]+: d5 \(bad\) -[ ]*[a-f0-9]+: 0a d5 or dl,ch +[ ]*[a-f0-9]+: 0a .byte 0xa 0+3 : [ ]*[a-f0-9]+: d5 \(bad\) -[ ]*[a-f0-9]+: 02 d4 add dl,ah +[ ]*[a-f0-9]+: 02 .byte 0x2 0+5 : [ ]*[a-f0-9]+: d4 \(bad\) -[ ]*[a-f0-9]+: 0a d4 or dl,ah +[ ]*[a-f0-9]+: 0a .byte 0xa 0+7 : [ ]*[a-f0-9]+: d4 \(bad\) -[ ]*[a-f0-9]+: 02 3f add bh,BYTE PTR \[rdi\] +[ ]*[a-f0-9]+: 02 .byte 0x2 0+9 : [ ]*[a-f0-9]+: 3f \(bad\) 0+a : -[ ]*[a-f0-9]+: 62 \(bad\) -[ ]*[a-f0-9]+: 10 27 adc BYTE PTR \[rdi\],ah +[ ]*[a-f0-9]+: 62 .byte 0x62 +[ ]*[a-f0-9]+: 10 .byte 0x10 0+c : [ ]*[a-f0-9]+: 27 \(bad\) diff --git a/gas/testsuite/gas/i386/x86-64-opcode-inval.d b/gas/testsuite/gas/i386/x86-64-opcode-inval.d index ccb19ac..d0d08cb 100644 --- a/gas/testsuite/gas/i386/x86-64-opcode-inval.d +++ b/gas/testsuite/gas/i386/x86-64-opcode-inval.d @@ -11,26 +11,26 @@ Disassembly of section .text: 0+1 : [ ]*[a-f0-9]+: d5 \(bad\) -[ ]*[a-f0-9]+: 0a d5 or %ch,%dl +[ ]*[a-f0-9]+: 0a .byte 0xa 0+3 : [ ]*[a-f0-9]+: d5 \(bad\) -[ ]*[a-f0-9]+: 02 d4 add %ah,%dl +[ ]*[a-f0-9]+: 02 .byte 0x2 0+5 : [ ]*[a-f0-9]+: d4 \(bad\) -[ ]*[a-f0-9]+: 0a d4 or %ah,%dl +[ ]*[a-f0-9]+: 0a .byte 0xa 0+7 : [ ]*[a-f0-9]+: d4 \(bad\) -[ ]*[a-f0-9]+: 02 3f add \(%rdi\),%bh +[ ]*[a-f0-9]+: 02 .byte 0x2 0+9 : [ ]*[a-f0-9]+: 3f \(bad\) 0+a : -[ ]*[a-f0-9]+: 62 \(bad\) -[ ]*[a-f0-9]+: 10 27 adc %ah,\(%rdi\) +[ ]*[a-f0-9]+: 62 .byte 0x62 +[ ]*[a-f0-9]+: 10 .byte 0x10 0+c : [ ]*[a-f0-9]+: 27 \(bad\)