devwiki:nvidia

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
devwiki:nvidia [2022/09/21 02:55] – [Nvidia Cuda programming] yingdevwiki:nvidia [2022/09/21 09:00] (current) – [Nvidia for AI] ying
Line 40: Line 40:
       * each thread has its thread ID, that its id determine which block of data it works on, and all threads finish the data together at the same data.       * each thread has its thread ID, that its id determine which block of data it works on, and all threads finish the data together at the same data.
       * optimize how code use memory can be important to fit more thing in the fixed size memory by better arrangement and swapping thing in memory blocks.       * optimize how code use memory can be important to fit more thing in the fixed size memory by better arrangement and swapping thing in memory blocks.
 +
 +====== CV-Cuda ======
 +
 +  * CV cuda: computer vision with cuda.
  
 ====== Nivdia RTX stack ====== ====== Nivdia RTX stack ======
Line 45: Line 49:
   * 1st Gen: VkRay, DXR, DLSS1   * 1st Gen: VkRay, DXR, DLSS1
   * 2nd Gen:    * 2nd Gen: 
-    * real-time denoise+    * real-time denoise: spatial denoise
     * caustics     * caustics
-    * RTXDI +    * RTXDI: raytrace direct illumination, casting shadows from all lights, emissive surface 
-    * RTXGI+    * RTXGI: real-time multiple bounce indirect lighting
     * Reflex     * Reflex
-    * DLSS2+    * DLSS2: deep learning super resolution, AI generat pixel
   * 3rd Gen:    * 3rd Gen: 
     * Displaced micro-meshes     * Displaced micro-meshes
-    * 2D SGM optical flow, shader execution reordering, real-time path tracing, opacity micro-maps, DLSS3+    * 2D SGM optical flow, shader execution reordering, real-time path tracing, opacity micro-maps,  
 +    * DLSS3: deep learning super resolution, AI frame generator 
 + 
 +====== Nvidia GPU architecture ====== 
 + 
 +| core   ^ turing ^ ampere ^ ada | 
 +| shader | 16     | 40     | 90  | 
 +| RT     | 49     | 78     | 200 | 
 +| tensor | 130    | 320    | 1400 | 
 +| OFA    |        | 126    | 300  | 
 + 
 +====== Nvidia for AI ======
  
 +  * large language model: enable single model to do various different task with one single model, context aware output. like text related, image related.
 +  * NeMo LLM service, Prompt learning framework, to promp learn with pre-trained LLM for specific task.
 +  * recommed system: like in shopping, social network
  • devwiki/nvidia.1663728905.txt.gz
  • Last modified: 2022/09/21 02:55
  • by ying