Lec1_ARM Archi (2)
Lec1_ARM Archi (2)
Hardware
Textbook
• ARM Assembly Language Fundamentals and Techniques, 2nd edition, W.
Hohl, and C. Hinds, CRC Press, 2014, ISBN-10: 1482229854 – 90%
• Computers as Components – Principles of Embedded Computing System
Design, 4th Edition, Marilyn Wolf, October 2016, 978-0128053874 – 10%
What is ARM Architecture
• ARM architecture is a family of RISC-based processor architectures
• Well-known for its power efficiency
• Hence widely used in mobile devices, such as smart phones and tablets
• Designed and licensed to a wide eco-system by ARM
ARM Processor Families
1. https://www.arm.com/products/processors/cortex-m
ARM Cortex-M Series
• Cortex-M series: Cortex-M0, M0+, M1, M3, M4, M7, M33.
• Energy-efficiency
• Lower energy cost, longer battery life
• Smaller code
• Lower silicon costs
• Ease of use
• Faster software development and reuse
• Embedded applications
• Smart metering, human interface devices, automotive and industrial control systems, white
goods, consumer products and medical instrumentation, IoT
• Companies Making ARM Chips
• Apple, AppliedMicro, Microchip (Atmel), Broadcom, Cypress Semiconductor, Nvidia, NXP,
Samsung Electronics, ST Microelectronics, and Texas Instruments (http://en.wiki-
pedia.org/wiki/ARM_architecture)
• More — Xilinx (Zynq), ...
Cortex-M Series
oT,
.
rals
Harvard Architecture
CORTEX-M: Core + Peripherals
• Chip • Peripherals
• Memory • ADC (Analog to Digital Converter)
• FLASH: Non-Volatile / Instruction memory • LCD Controller
• SRAM/DRAM: Volatile / data memory • SPI (Serial Peripheral Interface, eg.
• Processor Core Sensor)
• I2C (Inter-Integrated Circuit, eg. I/O,
• ALU
A/D, D/A, EEPROM)
• Processor Control Unit (CPU) • Etc.
• Registers
• Special Purpose Registers (SP, FP)
• General Purpose Registers (R0-R13)
• Buses
• Data Bus
• Instruction Bus
• Bus bridge to connect diff. buses
• Advanced High-performance Bus (AHB)
• Advanced Peripheral Bus (APB)
• GPIO (General Purpose Input/Output)
ARM-v7 Cortex-M4 Processor - Introduction
• Cortex-M4 Processor
• Introduced in 2010
• Designed with large variety of highly efficient signal processing features
• Features extended single-cycle multiply accumulate instructions, optimized SIMD
arithmetic, saturating arithmetic and an optional Floating Point Unit.
• High Performance Efficiency
• 1.25 MIPS (Million Instructions Per Second) at the order of μWatts
• Low Power Consumption
• Longer battery life – especially critical in mobile products
• Enhanced Determinism
• The critical tasks and interrupt routines can be served quickly in a known number of
cycles (trying to perform as Cortex-R)
Cortex-M4 Processor Block Diagram With Peripheral Connectivity
Optional FPU
Nested Vector Optional
Optional Interrupt
WIC Embedded
Controller Processor core Trace Macrocell
(NVIC)
Optional
Optional Memory Optional Serial
Debug
protection unit Wire Viewer
Access Port
Optional Optional
Flash Data
patch watchpoints
Cortex-M4 Block D
Bus matrix
SRAM and
Code interface
peripheral interface
Cont..
• Processor core
• Contains internal registers, the ALU, data path, and some control logic
• Registers include sixteen 32-bit registers for both general and special usage
• Nested Vectored Interrupt Controller (NVIC)
• Up to 240 interrupt request signals
• Automatically handles nested interrupts, such as comparing priorities between interrupt
requests and the current priority level
• Wakeup Interrupt Controller (WIC)
• For low-power applications, the microcontroller can enter sleep mode by shutting down
most of the components.
• When an interrupt request is detected, the WIC can inform the power management unit
to power up the system.
• Memory Protection Unit (MPU) - optional
• Used to protect memory content, e.g. make some memory regions read-only or
preventing user applications from accessing privileged application data
Cont..
• Bus interconnect
• Allows data transfer to take place on different buses simultaneously
• Provides data transfer management, e.g. a write buffer, bit- oriented operations (bit-
band)
• May include bus bridges (e.g. AHB-to-APB bus bridge) to connect different buses into a
network using a single global memory space
• Includes the internal bus system, the data path in the processor core, and the AHB LITE
interface unit
• Debug subsystem
• Handles debug control, program breakpoints, and data watchpoints
• When a debug event occurs, it can put the processor core in a halted state, where
developers can investigate the status of the processor at that point, such as register
values and flags
TM4C123GH6PM MC Block Diagram
• JTAG (Joint Test Action Group)– Design
testing and Verification
• SWD (Serial Wire Debug)
• ETM (Embedded Trace Macrocell)- low-
power debug tool for instruction trace
• NVIC (Nested Vector Interrupt)- Use for
prioritizing interrupts
• MPU (Memory Protection Unit)- Memory
management
Tiva C Series
TM4C123G LaunchPad
Cortex-M4 Pipeline
• Processor pipeline stages
• Three-stage pipeline: fetch, decode, and execution
• Some instructions may take multiple cycles to execute, in which case the pipeline will be
Three-state pipeline: Fetch, Decode, Execution
stalled
• The pipeline will be flushed
• Pipelining allows if a branch
hardware instruction
resources to be fullyis executed
utilized
• One 32-bitcan
• Up to two instructions instruction or two 16-bit
be fetched in oneinstructions
transfer can(16-
be fetched.
bit instructions)
1. Fetch
instruction at
PC address
3. Execute 2. Decode
the the
instruction instruction
Multi-cycle Instruction
Fetch
– The instruction is fetched from memory and placed in the instruction pipeline
Decode
– The instruction is decoded and the datapath control signals prepared for the
next cycle
Execute
– The register bank is read, an operand shifted, the ALU result generated and
written back into destination register
SOC Consortium Course Material
Bank (V7-M) R2
R3
Low
R4 Registers
R5
General purpose
R6
register
R7
R8
R9
R10 High
Registers
R11
R12 MSP
Stack Pointer (SP) R13(banked) Main Stack Pointer
Special registers Program Status Registers (PSR) x PSR APSR EPSR IPSR
PRIMASK Application Execution Interrupt
PSR PSR PSR
Interrupt mask register FAULTMASK
BASEPRI
Stack definition CONTROL
Register Bank User/System Supervisor Abort
Mode
Undefined Interrupt Fast interrupt
(ARM7TDMI) R0
R1
R0
R1
R0
R1
R0
R1
R0
R1
R0
R1
• ARM7TDMI processor
R2 R2 R2 R2 R2 R2
R3 R3 R3 R3 R3 R3
has a total of 37 R4
R5
R4
R5
R4
R5
R4
R5
R4
R5
R4
R5
registers R6 R6 R6 R6 R6 R6
R7 R7 R7 R7 R7 R7
• 30 general-purpose R8 R8 R8 R8 R8 R8_FIQ
value R12
R13
R12
R13_SVC
R12 R12
R13_ABORT R13_UNDEF
R12
R13_IRQ
R12_FIQ
R13_FIQ
• 6 status registers R14 R14_SVC R14_ABORT R14_UNDEF R14_IRQ R14_FIQ
PC PC PC PC PC PC
• A Program Counter
register CPSR CPSR CPSR CPSR
SPSR_SVC SPSR_ABORT SPSR_UNDEF
CPSR
SPSR_IRQ
CPSR
SPSR_FIQ
= banked register
General Purpose Registers
• R0 – R12: general purpose registers
• Low registers (R0–R7): can be accessed by any instruction
• High registers (R8 – R12): sometimes cannot be accessed e.g. by some Thumb (16-bit)
instructions
Stack Pointer (SP) and Link Register (LR)
• R13: Stack Pointer (SP)
Cortex-M4 Registers (cont.)
• Records the current address of the stack
• Used for saving the context of a program while switching between tasks
• R14:
• Cortex-M4 hasLink Register (LR)
two SPs:
• Main–SPThe
– usedLRin applications
is used tothat require
store theprivileged access e.g.of
return address OSakernel, and exception handlers,
subroutine
• Process SP - used in base-level application code (when not running an exception handler)
or a function call
• R14: Link Register (LR)
• The LR is–used
Thetoprogram
store the counter (PC) of
return address will load the orvalue
a subroutine fromcall
a function LR
• The program counter
after (PC) will
a function is load the value from LR after a function is finished
finished
Current PC Current LR
PC LR
1. Save current Main Main
PC to LR Program C Program
o C
code d Load PC with the code o
LR e d
re address in LR to e
g re
io return to the main g
2. Load PC with n program io
the starting n
address of the
subroutine subroutine
subroutine Current PC
PC
Code
• xPSR, combined Program Status Register
Special Purpose
– Provides Registers
information (SPR)
about program execution and ALU
flags
• xPSR, combined Program Status Register
– Application PSR (APSR)
• Provides information about program execution and ALU flags
• – Interrupt
Application PSR–(IPSR)
PSR (APSR) ALU status
• Interrupt PSR (IPSR)
• – Execution
Execution PSR (EPSR)
PSR (EPSR)
FAULTMASK Reserved
BASEPRI Reserved
CONTROL Reserved
over the next few chapters, we’ll discover that the Cortex-M4 only executes Th
instructions, rather than ARM instructions as the ARM7TDMI does, and the
col requires it. This vector table is relocatable after the processor comes out o
ECE 5655/4655 Real-Time DSP however,
2–19 our focus for now is to write short blocks of code without any excep
errors, covering procedural details first and worrying about all of the variation
be ignored.
• Memory is arranged as
• a series isofarranged
Memory “locations”
as a series of “locations” 0xFFFFFFFF
• Each location has a unique• Each location has a unique “address”
“address”
• Each location holds a byte (byte-addressable)
• Each location holds a byte• (byte-addressable)
e.g. the memory location at address 0x080001B0
• e.g. the memory location atcontains
address the0x080001B0 contains
byte value 0x70, i.e., 112the byte
value 0x70, i.e., 112 • The number of locations in memory is limited
• The number of locations ••ine.g. 4 GB of RAM
memory is limited
1 Gigabyte (GB) = 230 bytes
70
BC
0x080001B0
0x080001AF
• e.g.4GB of RAM • 232 locations è 4,294,967,296 locations! 18 0x080001AE
01 0x080001AD
• 1 Gigabyte (GB) = 230 bytes
• Values stored at each location can represent A0 0x080001AC
either program
• 232 locations à 4,294,967,296 locationsdata or program instructions
• e.g. the value 0x70 might be the code used to tell
• Values stored at each location can represent
the processor either
to add two values program
together
data or program instructions
• e.g. the value 0x70 might be the code used to tell the processor to
add two values together 0x00000000
13 Memory
Cortex-M4 Memory Map
• The Cortex-M4 processor has 4 GB of memory address space
• Support for bit-band operation (detailed later)
• The 4GB memory space is architecturally defined as a number of regions
• Each region is given for recommended usage
• Easy for software programmer to port between different devices
• Nevertheless, despite of the default memory map, the actual usage of the
memory map can also be flexibly defined by the user, except some fixed
memory addresses, such as internal private peripheral bus
Cortex M4 Memory Map
Reserved for other purposes Vendor specific 0xFFFFFFFF
ROM table
Memory 0xE0100000
512MB
Private peripherals Private Peripheral Bus 0xE00FFFFF External PPB
e.g. NVIC, SCS (PPB) 0xE0000000
External PPB
0xDFFFFFFF
Embedded trace macrocell
Mainly used for external peripherals Trace port interface unit
e.g. SD card External device 1GB
Reserved
0xA0000000
0x9FFFFFFF System Control Space, including
Mainly used for external memories Nested Vectored Interrupt
e.g. external DDR, FLASH, LCD External RAM 1GB Controller (NVIC) Internal PPB
0x60000000 Reserved
Mainly used for on-chip peripherals 0x5FFFFFFF
Fetch patch and breakpoint unit
e.g. AHB, APB peripherals Peripherals 512MB
0x40000000 Data watchpoint and trace unit
0x3FFFFFFF
Mainly used for data memory
e.g. on-chip SRAM, SDRAM SRAM 512MB Instrumentation trace macrocell
0x20000000
0x1FFFFFFF
Mainly used for program code
Code 512MB
e.g. on-chip FLASH 0x00000000
M4 Memory Map
• Code Region
• Used to store program code
• On-chip memory, such as on-chip FLASH
• SRAM Region
• Used to store data, such as heaps and stacks
• On-chip memory; despite its name “SRAM”, the actual device could be SRAM, SDRAM
or other types
• Peripheral Region
• Used for peripherals, such as Advanced High performance Bus (AHB) or Advanced
Peripheral Bus (APB) peripherals
M4 Memory Map
• External RAM Region
• Primarily used to store large data blocks, or memory caches
• Off-chip memory, slower than on-chip SRAM region
• External Device Region
• Primarily used to map to external devices
• Off-chip devices, such as SD card
• Internal Private Peripheral Bus (PPB)
• Used inside the processor core for internal control
• Within PPB, a special range of memory is defined as System Control Space (SCS)
• The Nested Vectored Interrupt Controller (NVIC) is part of SCS
Loading Code and Data into Memory
Loading Code and Data into Memory
SRAM
• However, Endianness only exists in the hardware level
• Endian refers to the order of bytes stored in memory
Endianness
– Little endian: lowest byte of a word-sizeAddress
data is stored in
[31:24] [23:16] [15:8] [7:0]
bit 0 to bit
• Endian 7
refers to the order of bytes stored in memory
0x00000008 Byte3 Byte2 Byte1 Byte0
Word 3
• Little
– Big endian:
endian: lowestlowest byte
byte of of a word-size
a word-size data isdata is stored
stored in bit in bit 0 to bit 7
0x00000004 Byte3 Byte2 Byte1 Byte0
Big endian: lowest byte of a word-size data is stored in bit 24 to Word
24• to bit 31 bit 231
Cortex-M4
• •Cortex-M4 supports
supports both
both little littleand
endian endian and big endian
big endian
0x00000000 Byte3 Byte2
Word 1
Byte1 Byte0
• Instruction
• However, REV only
Endianness reverses
existsthe
in byte order oflevel
the hardware a register, and RBIT reverses the bit order of
a register. Little endian 32-bit memory
Processor Operating Modes actually more. Excluding peripherals, the Cortex-M4 with floating-point har
contains the following registers as part of the programmer’s model:
LDR R1, =0x20000000 ;Setup address LDR R1, =0x2200000C ;Setup address
LDR R0, [R1] ;Read MOV R0, #1 ;Load data
ORR.W R0, #0x8 ;Modify bit STR R0, [R1] ;Write
STR R0, [R1] ;Write back
• Read-Modify-Write operation
• Read-Modify-Write operation
• Read the real data address (0x20000000)
• – Readthe
Modify thedesired
real data
bitaddress (0x20000000)
(retain other bits unchanged)
• Write the modified
– Modify databit
the desired back
(retain other bits unchanged)
• Bit-band operation
– Write the modified data back
• Directly set the bit by writing ‘1’ to address 0x2200000C, which is the alias address of
• the fourthoperation
Bit-band bit of the 32-bit data at 0x20000000
• In effect, this single instruction is mapped to 2 bus transfers: read data from
– Directly set
0x20000000 the buffer,
to the bit by writing
and then‘1’ to address
write 0x2200000C,
to 0x20000000 from the buffer with bit [3] set
which is the alias address of the fourth bit of the 32-bit
Bit-band Region Mapping in Bit-band Alias
Region
• bit_word_offset =
(byte_offset x 32) +
(bit_number × 4)
• bit_word_addr =
bit_band_base +
bit_word_offset
• Where:
Bit_word_offset is the
position of the target bit in
the bit-band memory region.
Bit_word_addr is the address
of the word in the alias
memory region that maps to
the targeted bit.
Bit_band_base is the starting
address of the alias region.
Byte_offset is the number of
the byte in the bit-band
region that contains the
targeted bit.
Bit_number is the bit position
(0-7) of the targeted bit.
used as the bit-band alias region for 1MB data
(0x20000000 – 0x200FFFFF)
Example
• Peripherals region
• We want to modify
– 32MB the 13th
memory bit(0x42000000
space of the memory–word stored at address
0x43FFFFFF) is
0x20000FF0usedusing
as bit
thebanding. What
bit-band memory
alias address,
region for from
1MB the bit band
data
alias region, should be used to modify this bit?
(0x40000000 – 0x400FFFFF)
0x43FFFFFF
Bridge
DRAM
controller
High-speed
I/O Low-speed
device I/O device
AMBA
On-chip peripherals bus (APB)
Note: Within the memory map, attempts to read or write addresses in reserved spaces result in
a bus fault. In addition, attempts to write addresses in the flash range also result in a bus
• Page 92
Tiva TM4C123GH6PM Microcontroller
• Page 93 0x4002.2000
0x4002.3000
0x4002.2FFF
0x4002.3FFF
I2C 2
2
I C3
1017
1017
0x4002.4000 0x4002.4FFF GPIO Port E 658
0x4002.5000 0x4002.5FFF GPIO Port F 658
0x4002.6000 0x4002.7FFF Reserved -
0x4002.8000 0x4002.8FFF PWM 0 1240
0x4002.9000 0x4002.9FFF PWM 1 1240
0x4002.A000 0x4002.BFFF Reserved -
0x4002.C000 0x4002.CFFF QEI0 1310
0x4002.D000 0x4002.DFFF QEI1 1310
0x4002.E000 0x4002.FFFF Reserved -
0x4003.0000 0x4003.0FFF 16/32-bit Timer 0 725
0x4003.1000 0x4003.1FFF 16/32-bit Timer 1 725
0x4003.2000 0x4003.2FFF 16/32-bit Timer 2 725
0x4003.3000 0x4003.3FFF 16/32-bit Timer 3 725
0x4003.4000 0x4003.4FFF 16/32-bit Timer 4 725
0x4003.5000 0x4003.5FFF 16/32-bit Timer 5 725
0x4003.6000 0x4003.6FFF 32/64-bit Timer 0 725
0x4003.7000 0x4003.7FFF 32/64-bit Timer 1 725
0x4003.8000 0x4003.8FFF ADC0 818
0x4003.9000 0x4003.9FFF ADC1 818
0x4003.A000 0x4003.BFFF Reserved -
0x4003.C000 0x4003.CFFF Analog Comparators 1220
The Cortex-M4F Processor
• Page 94 0x4004.0000
0x4004.1000
0x4004.0FFF
0x4004.1FFF
CAN0 Controller
CAN1 Controller
1067
1067
0x4004.2000 0x4004.BFFF Reserved -
0x4004.C000 0x4004.CFFF 32/64-bit Timer 2 725
0x4004.D000 0x4004.DFFF 32/64-bit Timer 3 725
0x4004.E000 0x4004.EFFF 32/64-bit Timer 4 725
0x4004.F000 0x4004.FFFF 32/64-bit Timer 5 725
0x4005.0000 0x4005.0FFF USB 1114
0x4005.1000 0x4005.7FFF Reserved -
0x4005.8000 0x4005.8FFF GPIO Port A (AHB aperture) 658
0x4005.9000 0x4005.9FFF GPIO Port B (AHB aperture) 658
0x4005.A000 0x4005.AFFF GPIO Port C (AHB aperture) 658
0x4005.B000 0x4005.BFFF GPIO Port D (AHB aperture) 658
0x4005.C000 0x4005.CFFF GPIO Port E (AHB aperture) 658
0x4005.D000 0x4005.DFFF GPIO Port F (AHB aperture) 658
0x4005.E000 0x400A.EFFF Reserved -
0x400A.F000 0x400A.FFFF EEPROM and Key Locker 540
0x400B.0000 0x400F.8FFF Reserved -
0x400F.9000 0x400F.9FFF System Exception Module 485
0x400F.A000 0x400F.BFFF Reserved -
0x400A.F000 0x400A.FFFF EEPROM and Key Locker 540
0x400B.0000 0x400F.8FFF Reserved -