• Tidak ada hasil yang ditemukan

Network Processors and IXC1100 Control Plane Processor

N/A
N/A
Nguyễn Gia Hào

Academic year: 2023

Membagikan "Network Processors and IXC1100 Control Plane Processor"

Copied!
568
0
0

Teks penuh

Intel Corporation may have patents or pending patent applications, trademarks, copyrights or other intellectual property rights related to the presented subject matter. The provision of documents and other materials and information does not grant any license, express or implied, by estoppel or otherwise, to any such patents, trademarks, copyrights or other intellectual property rights.

Revision History

  • Introduction
  • About This Document
    • How to Read This Document
  • Other Relevant Documents
  • Terminology and Conventions
    • Number Representation
    • Acronyms and Terminology
  • Overview of Product Line
  • Intel XScale ® Microarchitecture Processor
    • Intel XScale ® Processor Overview

Each chapter in this document focuses on a specific architectural feature of the Intel® IXP42X product line and IXC1100 control plane processors. Unless otherwise specified, the functional descriptions apply to all IXP42X product line and IXC1100 control plane processors.

Table 1. Acronyms and Terminology
Table 1. Acronyms and Terminology

Write Buffer

The Intel XScale processor implements the ARM V5 integer instruction set architecture, but does not provide hardware support of floating-point instructions. Operating systems may require modifications to fit specific hardware features of the IXP42X product line and IXC1100 control plane processors and to obtain.

Fill Buffer

These audio coding improvements focus on multiply and accumulate operations that speed up many audio filter operations. Note: The power management control feature has not been implemented in the IXP42X product line and the IXC1100 control plane processors.

Instruction Cache

The IXP42X product line and IXC1100 control plane processors are equipped to efficiently handle audio processing by supporting 16-bit data types and 16-bit operations. Several architectural improvements were made to the MAC to support audio coding algorithms, including a 40-bit accumulator and support for 16-bit packed data.

IMMU

The MAC unit supports multiply/accumulate early termination in two cycles and can sustain a throughput of a MAC operation each cycle.

DMMU

JTAGDebug

Branch Target Buffer

Data Cache

Data RAM

Power Mgnt

Mini- Data

Performance Monitoring

Memory Management

The Intel XScale processor implements the Memory Management Unit (MMU) architecture specified in the ARM Architecture Reference Manual. The MMU architecture also specifies the caching policy for the instruction cache and data cache.

Instruction Cache

Enable the write buffer to aggregate storage to external memory. See Section 3.1, “Memory management unit” on page 44 for more details.

Branch Target Buffer

Data Cache

Intel XScale ® Processor Performance Monitoring

Network Processor Engines (NPE)

Internal Bus

MII Interfaces

AHB Queue Manager

The two interrupts, one for queues 0-31 and one for queues 32-63, provide status interrupts to the Intel XScale processor. For more information about the AHB Queue Manager, see Section 21.0, "AHB Queue Manager (AQM)" on page 556.

UTOPIA 2

USB v1.1

Memory Controller

If an x16 memory chip is used, at least two memory chips will be required to facilitate the 32-bit interface required by the IXP42X product line and IXC1100 control plane processors. The memory controller interfaces internally with the North AHB and South AHB with independent peripherals.

Expansion Bus

This burst size allows the best efficiency/fairness performance between accesses from the North and South AHB.

High-Speed Serial Interfaces

Universal Asynchronous Receiver Transceiver

GPIO

Interrupt Controller

Timers

JTAG

Intel XScale ® Processor

Memory Management Unit

  • Memory Attributes .1 Page (P) Attribute Bit.1Page (P) Attribute Bit
    • Cacheable (C), Bufferable (B), and eXtension (X) Bits .1 Instruction Cache.1Instruction Cache
  • Interaction of the MMU, Instruction Cache, and Data Cache
  • MMU Control
    • Invalidate (Flush) Operation
    • Enabling/Disabling
    • Locking Entries
    • Round-Robin Replacement Algorithm

When the MMU is disabled, all data accesses are non-cacheable and non-bufferable. Therefore, only three of the four combinations of the MMU and data/mini-data cache enablement are valid.

Table 3. Data Cache and Buffer Behavior When X = 0
Table 3. Data Cache and Buffer Behavior When X = 0

Instruction Cache

  • Operation When Instruction Cache is Enabled
    • Instruction-Cache ‘Miss’
    • Instruction-Cache Line-Replacement Algorithm
    • Instruction-Cache Coherence

Each set in the instruction cache has a round-robin pointer that keeps track of the next line (in that set) to be replaced. By writing to coprocessor 15, register 9 unlocks all the locked lines in the instruction cache and leaves them valid.

Figure 8. Instruction Cache Organization
Figure 8. Instruction Cache Organization

Branch Target Buffer

  • Branch Target Buffer (BTB) Operation
    • Reset

The BTB uses bits [8:2] of the current address to read the tag and then compares this tag to bits [31:9,1] of the current instruction address. Before enabling or disabling the BTB, the software must invalidate the BTB (described in the next section).

Figure 10. BTB Entry
Figure 10. BTB Entry

Data Cache

  • Data Cache Overview
  • Cacheability
  • Reconfiguring the Data Cache as Data RAM

Each set in the data cache has a round-robin pointer that keeps track of which next line (in that set) to replace. Individual entries can be invalidated and purged through coprocessor 15, register 7 in the data cache and mini data cache.

Figure 12. Data Cache Organization
Figure 12. Data Cache Organization

Configuration

  • CP15 Registers
    • Register 0: ID and Cache Type Registers
    • Register 1: Control and Auxiliary Control Registers
    • Register 3: Domain Access Control Register
    • Register 4: Reserved
    • Register 5: Fault Status Register
    • Register 6: Fault Address Register
    • Register 7: Cache Functions
    • Register 8: TLB Operations
    • Register 9: Cache Lock Down
    • Register 10: TLB Lock Down
    • Register 11-12: Reserved
    • Register 13: Process ID
    • The PID Register Affect On Addresses
    • Register 14: Breakpoint Registers
    • Register 15: Coprocessor Access Register
  • CP14 Registers
    • Performance Monitoring Registers
    • Clock and Power Management Registers
    • Software Debug Registers

The cache type register is selected when opcode_2=1 and describes the cache configuration of the Intel XScale processor. The mini-data cache configuration must be configured before any data can be accessed that can be stored in the mini-data cache. Bits [31:5] of Rd are used to specify the virtual address of the line to be allocated to the data cache.

An application can request the use of a shared resource (eg the accumulator in CP0) by issuing an access to the resource, which will result in an undefined exception.

Table 7. MRC/MCR Format
Table 7. MRC/MCR Format

Software Debug

  • Definitions
  • Debug Registers
  • Debug Modes
    • Halt Mode
    • Monitor Mode
  • Debug Control and Status Register (DCSR)
    • Global Enable Bit (GE)
    • Halt Mode Bit (H)
    • Vector Trap Bits (TF,TI,TD,TA,TS,TU,TR)
    • Sticky Abort Bit (SA)
    • Method of Entry Bits (MOE)
    • Trace Buffer Mode Bit (M)
    • Trace Buffer Enable Bit (E)
  • Debug Exceptions
    • Halt Mode
    • Monitor Mode
  • HW Breakpoint Resources
    • Instruction Breakpoints
    • Data Breakpoints
  • Software Breakpoints
  • Transmit/Receive Control Register

Fill once mode: The trace buffer automatically generates a debug exception (trace buffer full break) when it becomes full. The processor automatically clears this bit to disable the trace buffer when a debug exception occurs. When the stop mode is active, the processor uses the reset vector as a debug vector.

At a data breakpoint, the processor generates a debug exception and redirects execution to the debug handler before executing the next instruction.

Table 33. Debug Control and Status Register (DCSR) (Sheet 1 of 2)
Table 33. Debug Control and Status Register (DCSR) (Sheet 1 of 2)

TXRXCTRL)

  • RX Register Ready Bit (RR)
  • Overflow Flag (OV)
  • Download Flag (D)
  • TX Register Ready Bit (TR)
  • Conditional Execution Using TXRXCTRL
  • Transmit Register
  • Receive Register
  • Debug JTAG Access
    • SELDCSR JTAG Command
    • SELDCSR JTAG Register

When the RR bit is clear, indicating that the debug handler is ready, the debugger starts downloading. After completing the download, the debugger clears the D bit so that the debug handler can exit the download loop. The debugger and debug handler use the TR bit to synchronize accesses to the TX register.

The debugger and debug handler must poll the TR bit before accessing the TX register.

Table 41. TX Handshaking
Table 41. TX Handshaking

TDI TDO

  • DBGTX JTAG Command
  • DBGTX JTAG Register
  • DBGRX JTAG Command
  • DBGRX JTAG Register
  • Debug JTAG Data Register Reset Values
  • Trace Buffer
    • Trace Buffer CP Registers
  • Trace Buffer Entries
    • Message Byte
    • Trace Buffer Usage
  • Downloading Code in ICache
    • LDIC JTAG Command
    • LDIC JTAG Data Register
    • LDIC Cache Functions
    • Loading IC During Reset
    • Dynamically Loading IC After Reset

The debugger does not know the starting address of the oldest entry read from the trace buffer. When any exception occurs, the exception message is placed in the trace buffer. The address placed in the trace buffer will be the address of the target application.

The LDIC JTAG instruction selects the JTAG data register for loading code into the instruction cache.

Figure 17. DBGTX Hardware
Figure 17. DBGTX Hardware

Debugger Actions

Dynamic Code Download Synchronization” on page 124 describes the details for implementing the handshake in the debug handler. Execution of the debug handler starts when the application running on the IXP42X product line and IXC1100 control plane processors generates a debug exception or when the host generates an external debug interrupt. While the DBGTX JTAG instruction is in the JTAG IR (see “DBGTX JTAG Command” . on page 105), the host requests DBG_SR[0], and waits for the debug handler to set it.

When the debugger gets to the point where it's OK to start transferring code, it writes to TX, which automatically sets DBG_SR[0].

Debug Handler Actions

  • Mini-Instruction Cache Overview
  • Halt Mode Software Protocol
    • Starting a Debug Session
    • Implementing a Debug Handler
    • Ending a Debug Session
  • Software Debug Notes and Errata
  • Performance Monitoring
    • Overview
    • Register Description .1 Clock Counter (CCNT).1Clock Counter (CCNT)

The host waits for the debug handler to signal that it is ready. Execution is redirected to the debug handler so that the debugger can perform any necessary initialization. The debug handling code does not need to be specially mapped to avoid this problem.

For all three methods, the downloaded code is executed in the context of the debug handler.

Table 51. Debug-Handler Code to Implement Synchronization During Dynamic Code  Download
Table 51. Debug-Handler Code to Implement Synchronization During Dynamic Code Download

PMN0 - PMN3)

Performance Monitor Control Register

PMNC)

Interrupt Enable Register

INTEN)

Overflow Flag Status Register

FLAG)

Event Select Register

EVTSEL)

Managing the Performance Monitor

An interrupt request will be generated when a counter's overflow flag is set and its associated interrupt enable bit is set INTEN. The interrupt request will remain asserted until the software clears the overflow flag by writing a one to the set flag. Note that the product-specific interrupt device and CPSR must have enabled the interrupt for software to receive it.).

This can be done in the interrupt service routine (ISR) where an increment in a memory location each time the interrupt occurs will enable longer performance monitoring times.

Performance Monitoring Events

  • Instruction Cache Efficiency Mode
  • Data Cache Efficiency Mode
  • Instruction Fetch Latency Mode
  • Data/Bus Request Buffer Full Mode
  • Stall/Write-Back Statistics
  • Instruction TLB Efficiency Mode
  • Data TLB Efficiency Mode

PMN0 collects the number of cycles when the instruction cache is unable to deliver an instruction to the IXP42X product line and IXC1100 control plane processors due to an instruction cache miss or instruction TLB miss. The average number of cycles the processor is stuck on a data cache access that may overflow the data cache buffers. PMN1 counts the number of instruction TLB table walks, which occur when there is a TLB miss.

PMN1 counts the number of data TLB table-walks that occur when there is a TLB miss.

Multiple Performance Monitoring Run Statistics

The total number of requests to return data to external memory can only be retrieved with PMN1. PMN0 is the total number of instructions that were executed, which does not include instructions that were translated from the instruction TLB and were never executed. PMN0 is the total number of data memory accesses, which includes cache and non-cache accesses, data mini-cache accesses, and accesses made to locations configured as data RAM.

Note that STM and LDM each count as different accesses to the data TLB depending on the number of registers specified in the register list.

Examples

This can happen if a branch instruction changes the program flow; the instruction TLB can translate the next consecutive instructions to the branch, before it receives the target address of the branch. The average number of cycles it took to execute an instruction or commonly referred to as cycles-per-instruction (CPI). Assume that performance count interrupts are the only IRQ in the system MRC P14,0,R1,C0,c1,0; read the PMNC register.

In the seasoned example above, the instruction cache had a miss rate of 5% and CPI was 2.4.

Programming Model

  • ARM * Architecture Compatibility
  • ARM * Architecture Implementation Options
    • Big Endian versus Little Endian
    • Thumb
    • ARM * DSP-Enhanced Instruction Set
    • Base Register Update
  • Extensions to ARM * Architecture
    • DSP Coprocessor 0 (CP0)
    • New Page Attributes
    • Additions to CP15 Functionality
    • Event Architecture .1 Exception Summary.1Exception Summary

The Intel XScale processor maintains the ARM definitions for C and B encoding when X = 0, which differs from ARM products. The value set in R14_ABORT (Abort Mode Link Register) is the address of the aborted instruction + 4. A latch interrupt is a precise data interrupt; the extended status field of the error status register is set to 0xb10100.

The error address register is not defined and R14_ABORT is the address of the aborted instruction + 8.

Table 62. Multiply with Internal Accumulate Format
Table 62. Multiply with Internal Accumulate Format

Performance Considerations

  • Interrupt Latency
  • Branch Prediction
  • Addressing Modes
  • Instruction Latencies
    • Performance Terms
    • Branch Instruction Timings
    • Multiply Instruction Timings
    • Saturated Arithmetic Instructions
    • Status Register Access Instructions
    • Load/Store Instructions
    • Semaphore Instructions
    • Coprocessor Instructions
    • Miscellaneous Instruction Timing
    • Thumb Instructions

The cycle distance from the first issued clock of the current command to the issued clock of the next command. The cycle distance from the first issue time of the current command to the issue time of the first command that can use the result without causing a deadlock due to a resource dependency. Note: If the next instruction needs to use the result of processing the shift data immediately or as Rn.

Note: If the next instruction must use the result of the data processing for a shift immediately or as Rn in a QDAD or QDSUB, one extra cycle of result delay is added to the listed number.

Table 76. Branch Latency Penalty
Table 76. Branch Latency Penalty

Optimization Guide

  • Introduction
    • About This Section
  • Processors’ Pipeline
    • General Pipeline Characteristics
    • Instruction Flow Through the Pipeline
    • Main Execution Pipeline
    • Memory Pipeline
    • Multiply/Multiply Accumulate (MAC) Pipeline
  • Basic Optimizations
    • Conditional Instructions
    • Bit Field Manipulation
    • Optimizing the Use of Immediate Values
    • Optimizing Integer Multiply and Divide
    • Effective Use of Addressing Modes
  • Cache and Prefetch Optimizations
    • Instruction Cache
    • Data and Mini Cache
    • Cache Considerations
    • Prefetch Considerations
  • Instruction Scheduling
    • Scheduling Loads
    • Scheduling Data Processing Instructions
    • Scheduling Multiply Instructions
    • Scheduling SWP and SWPB Instructions
    • Scheduling the MRA and MAR Instructions (MRRC/MCRR)
    • Scheduling the MIA and MIAPH Instructions
    • Scheduling MRS and MSR Instructions
    • Scheduling CP15 Coprocessor Instructions
  • Optimizing C Libraries
  • Optimizations for Size
    • Space/Performance Trade Off

One of the biggest differences between the IXP42X product line and the IXC1100 control plane processors and ARM processors is the pipeline. This section briefly describes the structure and behavior of the IXP42X product line and the pipeline of IXC1100 control plane processors. The instructions of the IXP42X product line and the IXC1100 control plane processors can selectively change the state of the condition codes.

In the case of the IXP42X product line and the IXC1100 control plane processors, a branch misprediction incurs a penalty of four cycles.

Table 93 gives a brief description of each pipe-stage.
Table 93 gives a brief description of each pipe-stage.

Gambar

Table 1. Acronyms and Terminology (Continued)
Figure 1. Intel ®  IXP425 Network Processor Block Diagram
Figure 2. Intel ®  IXP423 Network Processor Block Diagram
Figure 3. Intel ®  IXP422 Network Processor Block Diagram
+7

Referensi

Dokumen terkait

Criminal law policies undertaken in the prevention of criminal acts of trade people is as follows the policy formulation/application/ Policy legislation, yudiksi, execution