Skip to content
Tao Luo edited this page Dec 9, 2019 · 1 revision

Lei Wang

wangkuiyi

helinwang

tonyyang-svail

weixing

abhinavarora

CPPLint Progress

Directory Develop 15-Mar
fluid/framework 29 227
fluid/framework/details 0 5
fluid/inference 0 15
fluid/inference/tensorrt 14 N/A
fluid/memory 0 2
fluid/operators 0 303
fluid/operators/reader 0 8
fluid/operators/concurrency 0 N/A
fluid/operators/math 328 369
fluid/operators/detail 0 29
fluid/operators/nccl 2 2
fluid/platform 0 155
fluid/pybind 0 41
fluid/recordio 0 18
fluid/string 0 7

Chenxi

kexinzhao

Qingsheng Li

Xin Pan

luotao

wuyi

Baiyifan

tangwei

fengjiayi

Yu Yang

  • Fix a critical bug of dynloader
    • We use dlsym to extract function pointer from shared library(dynload namespace). We cast the pointer to the type that exactly fit the invoke parameter, not the actually function type defined in header.
      • for example, if we pass an (int, int) to a function void((int64_t, int64_t)). We will cast the function symbol to void((int, int)), rather than void(*(int64_t, int64_t)). It will cause bug if sizeof(int) != sizeof(int64) on some platform.
    • https://github.com/PaddlePaddle/Paddle/pull/10191
    • https://github.com/PaddlePaddle/Paddle/pull/10189
  • Find a critical bug of GPU memory allocator and memcpy
    • We found that we cannot synchonize stream if we invoke cudaMemcpyAsync on a CPU memory, which is allocated by malloc not cudaMallocHost. It is suggest to use cudaMallocHost to malloc CPU memory, when the memory is used for CPU <--> GPU communication.
    • When we change malloc to cudaMallocHost, we found that there are a lot of memory copies are not synchonized. It is a critical bug for Paddle and a key reason making our training process not stable.
    • Currentlly, we add cudaMemcpySync API to avoid the bug when feeding/fetching data. To resolve this bug thoroughly, it will take a week or longer.
  • Add a demo for parallel execturo + reader to train and test a program

gongweibao

wanghaoshuang

Dang Qingqing

zhaochengduo

Liu Yiqun

yangyaming

qiaolongfei

Todo

  • do more benchmark about async training

Yan Xu

dongzhihong

Yibing Liu

Fluid2onnx convertor:

guosheng

Yan Chunwei

daming-lu

cs2be(thuan)

sidgoyal78

jetfuel(Jeff)

Nicky

Clone this wiki locally