Paul Beesley | f3653a6 | 2019-05-22 11:22:44 +0100 | [diff] [blame] | 1 | NVIDIA Tegra |
| 2 | ============ |
Douglas Raillard | d7c21b7 | 2017-06-28 15:23:03 +0100 | [diff] [blame] | 3 | |
Varun Wadekar | 13f57a8 | 2019-12-03 14:14:12 -0800 | [diff] [blame] | 4 | - .. rubric:: T194 |
| 5 | :name: t194 |
| 6 | |
| 7 | T194 has eight NVIDIA Carmel CPU cores in a coherent multi-processor |
| 8 | configuration. The Carmel cores support the ARM Architecture version 8.2, |
| 9 | executing both 64-bit AArch64 code, and 32-bit AArch32 code. The Carmel |
| 10 | processors are organized as four dual-core clusters, where each cluster has |
| 11 | a dedicated 2 MiB Level-2 unified cache. A high speed coherency fabric connects |
| 12 | these processor complexes and allows heterogeneous multi-processing with all |
| 13 | eight cores if required. |
| 14 | |
Varun Wadekar | 6801c79 | 2019-01-03 15:09:44 -0800 | [diff] [blame] | 15 | - .. rubric:: T186 |
| 16 | :name: t186 |
| 17 | |
| 18 | The NVIDIA® Parker (T186) series system-on-chip (SoC) delivers a heterogeneous |
| 19 | multi-processing (HMP) solution designed to optimize performance and |
| 20 | efficiency. |
| 21 | |
Varun Wadekar | a0ea686 | 2021-04-23 22:26:18 -0700 | [diff] [blame] | 22 | T186 has Dual NVIDIA Denver2 ARM® CPU cores, plus Quad ARM Cortex®-A57 cores, |
Varun Wadekar | 6801c79 | 2019-01-03 15:09:44 -0800 | [diff] [blame] | 23 | in a coherent multiprocessor configuration. The Denver 2 and Cortex-A57 cores |
| 24 | support ARMv8, executing both 64-bit Aarch64 code, and 32-bit Aarch32 code |
| 25 | including legacy ARMv7 applications. The Denver 2 processors each have 128 KB |
| 26 | Instruction and 64 KB Data Level 1 caches; and have a 2MB shared Level 2 |
| 27 | unified cache. The Cortex-A57 processors each have 48 KB Instruction and 32 KB |
| 28 | Data Level 1 caches; and also have a 2 MB shared Level 2 unified cache. A |
| 29 | high speed coherency fabric connects these two processor complexes and allows |
| 30 | heterogeneous multi-processing with all six cores if required. |
| 31 | |
Douglas Raillard | d7c21b7 | 2017-06-28 15:23:03 +0100 | [diff] [blame] | 32 | Denver is NVIDIA's own custom-designed, 64-bit, dual-core CPU which is |
Dan Handley | 610e7e1 | 2018-03-01 18:44:00 +0000 | [diff] [blame] | 33 | fully Armv8-A architecture compatible. Each of the two Denver cores |
Douglas Raillard | d7c21b7 | 2017-06-28 15:23:03 +0100 | [diff] [blame] | 34 | implements a 7-way superscalar microarchitecture (up to 7 concurrent |
| 35 | micro-ops can be executed per clock), and includes a 128KB 4-way L1 |
| 36 | instruction cache, a 64KB 4-way L1 data cache, and a 2MB 16-way L2 |
| 37 | cache, which services both cores. |
| 38 | |
| 39 | Denver implements an innovative process called Dynamic Code Optimization, |
| 40 | which optimizes frequently used software routines at runtime into dense, |
| 41 | highly tuned microcode-equivalent routines. These are stored in a |
| 42 | dedicated, 128MB main-memory-based optimization cache. After being read |
| 43 | into the instruction cache, the optimized micro-ops are executed, |
| 44 | re-fetched and executed from the instruction cache as long as needed and |
| 45 | capacity allows. |
| 46 | |
| 47 | Effectively, this reduces the need to re-optimize the software routines. |
| 48 | Instead of using hardware to extract the instruction-level parallelism |
| 49 | (ILP) inherent in the code, Denver extracts the ILP once via software |
| 50 | techniques, and then executes those routines repeatedly, thus amortizing |
| 51 | the cost of ILP extraction over the many execution instances. |
| 52 | |
| 53 | Denver also features new low latency power-state transitions, in addition |
| 54 | to extensive power-gating and dynamic voltage and clock scaling based on |
| 55 | workloads. |
| 56 | |
Varun Wadekar | a0ea686 | 2021-04-23 22:26:18 -0700 | [diff] [blame] | 57 | - .. rubric:: T210 |
| 58 | :name: t210 |
| 59 | |
| 60 | T210 has Quad Arm® Cortex®-A57 cores in a switched configuration with a |
| 61 | companion set of quad Arm Cortex-A53 cores. The Cortex-A57 and A53 cores |
| 62 | support Armv8-A, executing both 64-bit Aarch64 code, and 32-bit Aarch32 code |
| 63 | including legacy Armv7-A applications. The Cortex-A57 processors each have |
| 64 | 48 KB Instruction and 32 KB Data Level 1 caches; and have a 2 MB shared |
| 65 | Level 2 unified cache. The Cortex-A53 processors each have 32 KB Instruction |
| 66 | and 32 KB Data Level 1 caches; and have a 512 KB shared Level 2 unified cache. |
| 67 | |
Douglas Raillard | d7c21b7 | 2017-06-28 15:23:03 +0100 | [diff] [blame] | 68 | Directory structure |
Paul Beesley | f3653a6 | 2019-05-22 11:22:44 +0100 | [diff] [blame] | 69 | ------------------- |
Douglas Raillard | d7c21b7 | 2017-06-28 15:23:03 +0100 | [diff] [blame] | 70 | |
| 71 | - plat/nvidia/tegra/common - Common code for all Tegra SoCs |
| 72 | - plat/nvidia/tegra/soc/txxx - Chip specific code |
| 73 | |
| 74 | Trusted OS dispatcher |
Paul Beesley | f3653a6 | 2019-05-22 11:22:44 +0100 | [diff] [blame] | 75 | --------------------- |
Douglas Raillard | d7c21b7 | 2017-06-28 15:23:03 +0100 | [diff] [blame] | 76 | |
Varun Wadekar | 6801c79 | 2019-01-03 15:09:44 -0800 | [diff] [blame] | 77 | Tegra supports multiple Trusted OS'. |
| 78 | |
| 79 | - Trusted Little Kernel (TLK): In order to include the 'tlkd' dispatcher in |
| 80 | the image, pass 'SPD=tlkd' on the command line while preparing a bl31 image. |
| 81 | - Trusty: In order to include the 'trusty' dispatcher in the image, pass |
| 82 | 'SPD=trusty' on the command line while preparing a bl31 image. |
| 83 | |
| 84 | This allows other Trusted OS vendors to use the upstream code and include |
| 85 | their dispatchers in the image without changing any makefiles. |
| 86 | |
| 87 | These are the supported Trusted OS' by Tegra platforms. |
| 88 | |
Varun Wadekar | 13f57a8 | 2019-12-03 14:14:12 -0800 | [diff] [blame] | 89 | - Tegra210: TLK and Trusty |
| 90 | - Tegra186: Trusty |
| 91 | - Tegra194: Trusty |
Douglas Raillard | d7c21b7 | 2017-06-28 15:23:03 +0100 | [diff] [blame] | 92 | |
Varun Wadekar | 4d034c5 | 2019-01-11 14:47:48 -0800 | [diff] [blame] | 93 | Scatter files |
Paul Beesley | f3653a6 | 2019-05-22 11:22:44 +0100 | [diff] [blame] | 94 | ------------- |
Varun Wadekar | 4d034c5 | 2019-01-11 14:47:48 -0800 | [diff] [blame] | 95 | |
| 96 | Tegra platforms currently support scatter files and ld.S scripts. The scatter |
| 97 | files help support ARMLINK linker to generate BL31 binaries. For now, there |
| 98 | exists a common scatter file, plat/nvidia/tegra/scat/bl31.scat, for all Tegra |
| 99 | SoCs. The `LINKER` build variable needs to point to the ARMLINK binary for |
| 100 | the scatter file to be used. Tegra platforms have verified BL31 image generation |
| 101 | with ARMCLANG (compilation) and ARMLINK (linking) for the Tegra186 platforms. |
| 102 | |
Douglas Raillard | d7c21b7 | 2017-06-28 15:23:03 +0100 | [diff] [blame] | 103 | Preparing the BL31 image to run on Tegra SoCs |
Paul Beesley | f3653a6 | 2019-05-22 11:22:44 +0100 | [diff] [blame] | 104 | --------------------------------------------- |
Douglas Raillard | d7c21b7 | 2017-06-28 15:23:03 +0100 | [diff] [blame] | 105 | |
| 106 | .. code:: shell |
| 107 | |
| 108 | CROSS_COMPILE=<path-to-aarch64-gcc>/bin/aarch64-none-elf- make PLAT=tegra \ |
Varun Wadekar | a0ea686 | 2021-04-23 22:26:18 -0700 | [diff] [blame] | 109 | TARGET_SOC=<target-soc e.g. t194|t186|t210> SPD=<dispatcher e.g. trusty|tlkd> |
Varun Wadekar | 6801c79 | 2019-01-03 15:09:44 -0800 | [diff] [blame] | 110 | bl31 |
Douglas Raillard | d7c21b7 | 2017-06-28 15:23:03 +0100 | [diff] [blame] | 111 | |
| 112 | Platforms wanting to use different TZDRAM\_BASE, can add ``TZDRAM_BASE=<value>`` |
| 113 | to the build command line. |
| 114 | |
| 115 | The Tegra platform code expects a pointer to the following platform specific |
| 116 | structure via 'x1' register from the BL2 layer which is used by the |
| 117 | bl31\_early\_platform\_setup() handler to extract the TZDRAM carveout base and |
| 118 | size for loading the Trusted OS and the UART port ID to be used. The Tegra |
| 119 | memory controller driver programs this base/size in order to restrict NS |
| 120 | accesses. |
| 121 | |
| 122 | typedef struct plat\_params\_from\_bl2 { |
| 123 | /\* TZ memory size */ |
| 124 | uint64\_t tzdram\_size; |
| 125 | /* TZ memory base */ |
| 126 | uint64\_t tzdram\_base; |
| 127 | /* UART port ID \*/ |
| 128 | int uart\_id; |
Harvey Hsieh | fbdfce1 | 2016-11-23 19:13:08 +0800 | [diff] [blame] | 129 | /* L2 ECC parity protection disable flag \*/ |
| 130 | int l2\_ecc\_parity\_prot\_dis; |
Varun Wadekar | 4967c3d | 2017-07-21 13:34:16 -0700 | [diff] [blame] | 131 | /* SHMEM base address for storing the boot logs \*/ |
| 132 | uint64\_t boot\_profiler\_shmem\_base; |
Douglas Raillard | d7c21b7 | 2017-06-28 15:23:03 +0100 | [diff] [blame] | 133 | } plat\_params\_from\_bl2\_t; |
| 134 | |
| 135 | Power Management |
Paul Beesley | f3653a6 | 2019-05-22 11:22:44 +0100 | [diff] [blame] | 136 | ---------------- |
Douglas Raillard | d7c21b7 | 2017-06-28 15:23:03 +0100 | [diff] [blame] | 137 | |
| 138 | The PSCI implementation expects each platform to expose the 'power state' |
| 139 | parameter to be used during the 'SYSTEM SUSPEND' call. The state-id field |
| 140 | is implementation defined on Tegra SoCs and is preferably defined by |
| 141 | tegra\_def.h. |
| 142 | |
| 143 | Tegra configs |
Paul Beesley | f3653a6 | 2019-05-22 11:22:44 +0100 | [diff] [blame] | 144 | ------------- |
Douglas Raillard | d7c21b7 | 2017-06-28 15:23:03 +0100 | [diff] [blame] | 145 | |
| 146 | - 'tegra\_enable\_l2\_ecc\_parity\_prot': This flag enables the L2 ECC and Parity |
Dan Handley | 610e7e1 | 2018-03-01 18:44:00 +0000 | [diff] [blame] | 147 | Protection bit, for Arm Cortex-A57 CPUs, during CPU boot. This flag will |
Douglas Raillard | d7c21b7 | 2017-06-28 15:23:03 +0100 | [diff] [blame] | 148 | be enabled by Tegrs SoCs during 'Cluster power up' or 'System Suspend' exit. |