Differences
This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision | ||
devwiki:nvidia [2022/09/21 02:55] – [Nvidia Cuda programming] ying | devwiki:nvidia [2022/09/21 09:00] (current) – [Nvidia for AI] ying | ||
---|---|---|---|
Line 40: | Line 40: | ||
* each thread has its thread ID, that its id determine which block of data it works on, and all threads finish the data together at the same data. | * each thread has its thread ID, that its id determine which block of data it works on, and all threads finish the data together at the same data. | ||
* optimize how code use memory can be important to fit more thing in the fixed size memory by better arrangement and swapping thing in memory blocks. | * optimize how code use memory can be important to fit more thing in the fixed size memory by better arrangement and swapping thing in memory blocks. | ||
+ | |||
+ | ====== CV-Cuda ====== | ||
+ | |||
+ | * CV cuda: computer vision with cuda. | ||
====== Nivdia RTX stack ====== | ====== Nivdia RTX stack ====== | ||
Line 45: | Line 49: | ||
* 1st Gen: VkRay, DXR, DLSS1 | * 1st Gen: VkRay, DXR, DLSS1 | ||
* 2nd Gen: | * 2nd Gen: | ||
- | * real-time denoise | + | * real-time |
* caustics | * caustics | ||
- | * RTXDI | + | * RTXDI: raytrace direct illumination, |
- | * RTXGI | + | * RTXGI: real-time multiple bounce indirect lighting |
* Reflex | * Reflex | ||
- | * DLSS2 | + | * DLSS2: deep learning super resolution, AI generat pixel |
* 3rd Gen: | * 3rd Gen: | ||
* Displaced micro-meshes | * Displaced micro-meshes | ||
- | * 2D SGM optical flow, shader execution reordering, real-time path tracing, opacity micro-maps, DLSS3 | + | * 2D SGM optical flow, shader execution reordering, real-time path tracing, opacity micro-maps, |
+ | * DLSS3: deep learning super resolution, AI frame generator | ||
+ | |||
+ | ====== Nvidia GPU architecture ====== | ||
+ | |||
+ | | core ^ turing ^ ampere ^ ada | | ||
+ | | shader | 16 | 40 | 90 | | ||
+ | | RT | 49 | 78 | 200 | | ||
+ | | tensor | 130 | 320 | 1400 | | ||
+ | | OFA | | 126 | 300 | | ||
+ | |||
+ | ====== Nvidia for AI ====== | ||
+ | * large language model: enable single model to do various different task with one single model, context aware output. like text related, image related. | ||
+ | * NeMo LLM service, Prompt learning framework, to promp learn with pre-trained LLM for specific task. | ||
+ | * recommed system: like in shopping, social network |