Paddle_baseline_KDD2019

More Information Go To https://github.com/PaddlePaddle/models/tree/develop/PaddleRec/ctr/Paddle_baseline_KDD2019

Paddle baseline for KDD2019 "Context-Aware Multi-Modal Transportation Recommendation"(https://dianshi.baidu.com/competition/29/question)

This repository is the demo codes for the KDD2019 "Context-Aware Multi-Modal Transportation Recommendation" competition using PaddlePaddle. It is written by python and uses PaddlePaddle to solve the task. Note that this repository is on developing and welcome everyone to contribute. The current baseline solution codes can get 0.68 - 0.69 score of online submission. As an example, my submission based on these networks programmed by PaddlePaddle is 0.6898. The reason of the publication of this baseline codes is to encourage us to use PaddlePaddle and build the most powerful recommendation model via PaddlePaddle.

The example codes are ran on Linux, python2.7, single machine with CPU. Currently, There are some Compatibility issues while using python3 (UPDATE: Currently, The codes can be run using python3, Please refer following instruction: "RUN ON Python3"). Note that distributed train options are not provided here, if you want to learn more about this, please check more modes examples on https://github.com/PaddlePaddle/models. About the speed of training, for one epoch, 1000 batch size, it would take about 8 mins to train the whole training instances generated from raw data using SGD optimizer (it would take relatively longer using Adam optimizer).

The configuration and process of all the networks are fundamental, a lot of optimizations can be done based on them to achieve better results e.g. better cost function, more powerful feature engineering, designed model validation, NN optimization tricks...

The code is rough and from my daily use. They will be trimmed these days...

Install PaddlePaddle

please visit the official site of PaddlePaddle(http://www.paddlepaddle.org/documentation/docs/zh/1.4/beginners_guide/install/index_cn.html)

preprocess feature

python preprocess_dense.py # change for different feature strategy
python pre_test_dense.py
#cd out
split -a 2 -d -l 200000 normed_train.txt normed_train

preprocess.py and preprocess_dense.py is the code for preprocessing the raw data. Two versions are provided to deal with all sparse features and sparse plus dense features. Correspondingly, pre_process_test.py and pre_test_dense.py are the codes to preproccess test raw data. The training instances are saved in json. It is very easy to add new features. In our demo, all features are generated from provided raw data except for weather feature, which is gengerated from open weather records. Note that the feature generated in this step need to fit in the input of the model input. Make sure we use the right version. In demo codes, The sparse plus dense features are used for network_confv6.

build the network

main network logic is in network_confv?.py. The networks are base on fm & deep related algorithms. I try several networks and public some of them. There may be some defects in the networks but all of them are functional.

train the network

python local_train.py

In local_train.py and map_reader.py, I use dataset API, so we need to download the corresponding .whl package or clone codes on develop branch of PaddlePaddle. The reason to use this is the speed of feeding data is much faster. Note that the input format feed into the network is self-defined. make sure we build the same format between training and test.

test results

python generate_test.py
python build_submit.py

In generate_test.py and build_submit, for convenience, I use the whole train data to train the network and test the network with provided data without label

RUN ON Python3

Running on python3, run the following python files with _py3 postfix, and keep the same for the rest in python2

python local_train_py3.py
python generate_test_py3.py
python build_submit_py3.py

Name		Name	Last commit message	Last commit date
Latest commit History 43 Commits
data_set_phase1		data_set_phase1
networks		networks
out		out
submit		submit
testres		testres
README.md		README.md
args.py		args.py
build_submit.py		build_submit.py
build_submit_py3.py		build_submit_py3.py
generate_test.py		generate_test.py
generate_test_py3.py		generate_test_py3.py
infer.py		infer.py
local_train.py		local_train.py
local_train_py3.py		local_train_py3.py
map_reader.py		map_reader.py
map_reader_mmh.py		map_reader_mmh.py
network_confv6.py		network_confv6.py
pre_process_test.py		pre_process_test.py
pre_test_dense.py		pre_test_dense.py
preprocess.py		preprocess.py
preprocess_dense.py		preprocess_dense.py
weather.json		weather.json

yaoxuefeng6/Paddle_baseline_KDD2019

Folders and files

Latest commit

History

Repository files navigation

Paddle_baseline_KDD2019

More Information Go To https://github.com/PaddlePaddle/models/tree/develop/PaddleRec/ctr/Paddle_baseline_KDD2019

Install PaddlePaddle

preprocess feature

build the network

train the network

test results

RUN ON Python3

About

Resources

Stars

Watchers

Forks

Languages