From d0649c8543733ee88a46035757341130636ed81f Mon Sep 17 00:00:00 2001 From: Max Jahn Date: Tue, 29 Nov 2022 17:14:15 +0100 Subject: [PATCH] Convert Documentation from reST to Markdown (#117) Thanks, @mj023 ! --- CHANGES.md | 40 ++++ CHANGES.rst | 42 ----- README.md | 36 ++++ README.rst | 55 ------ docs/source/background/{dag.rst => dag.md} | 39 ++-- ...sign_rationale.rst => design_rationale.md} | 10 +- ...y_structure.rst => directory_structure.md} | 47 ++--- docs/source/background/index.md | 54 ++++++ docs/source/background/index.rst | 58 ------ docs/source/background/pytask.md | 44 +++++ docs/source/background/pytask.rst | 47 ----- ...running_example.rst => running_example.md} | 32 ++-- .../background/{workflow.rst => workflow.md} | 27 ++- docs/source/conf.py | 8 +- docs/source/development/changes.md | 3 + docs/source/development/changes.rst | 1 - docs/source/development/index.md | 12 ++ docs/source/development/index.rst | 11 -- docs/source/faq.md | 168 +++++++++++++++++ docs/source/faq.rst | 177 ------------------ ..._dialogue.rst => cookiecutter_dialogue.md} | 68 ++++--- docs/source/getting_started/index.md | 40 ++++ docs/source/getting_started/index.rst | 43 ----- .../getting_started/preparing_your_system.md | 142 ++++++++++++++ .../getting_started/preparing_your_system.rst | 151 --------------- .../{second_machine.rst => second_machine.md} | 16 +- .../{environments.rst => environments.md} | 112 +++++------ .../{hooks.rst => hooks.md} | 28 ++- docs/source/guides_explanations/index.md | 41 ++++ docs/source/guides_explanations/index.rst | 43 ----- ...roject.rst => porting_existing_project.md} | 14 +- ...m_scratch.rst => starting_from_scratch.md} | 10 +- docs/source/{index.rst => index.md} | 58 +++--- docs/source/programming_languages/index.md | 81 ++++++++ docs/source/programming_languages/index.rst | 93 --------- docs/source/zreferences.md | 5 + docs/source/zreferences.rst | 4 - 37 files changed, 897 insertions(+), 963 deletions(-) create mode 100644 CHANGES.md delete mode 100644 CHANGES.rst create mode 100644 README.md delete mode 100644 README.rst rename docs/source/background/{dag.rst => dag.md} (55%) rename docs/source/background/{design_rationale.rst => design_rationale.md} (58%) rename docs/source/background/{directory_structure.rst => directory_structure.md} (59%) create mode 100644 docs/source/background/index.md delete mode 100644 docs/source/background/index.rst create mode 100644 docs/source/background/pytask.md delete mode 100644 docs/source/background/pytask.rst rename docs/source/background/{running_example.rst => running_example.md} (51%) rename docs/source/background/{workflow.rst => workflow.md} (81%) create mode 100644 docs/source/development/changes.md delete mode 100644 docs/source/development/changes.rst create mode 100644 docs/source/development/index.md delete mode 100644 docs/source/development/index.rst create mode 100644 docs/source/faq.md delete mode 100644 docs/source/faq.rst rename docs/source/getting_started/{cookiecutter_dialogue.rst => cookiecutter_dialogue.md} (69%) create mode 100644 docs/source/getting_started/index.md delete mode 100644 docs/source/getting_started/index.rst create mode 100644 docs/source/getting_started/preparing_your_system.md delete mode 100644 docs/source/getting_started/preparing_your_system.rst rename docs/source/getting_started/{second_machine.rst => second_machine.md} (63%) rename docs/source/guides_explanations/{environments.rst => environments.md} (55%) rename docs/source/guides_explanations/{hooks.rst => hooks.md} (50%) create mode 100644 docs/source/guides_explanations/index.md delete mode 100644 docs/source/guides_explanations/index.rst rename docs/source/guides_explanations/{porting_existing_project.rst => porting_existing_project.md} (61%) rename docs/source/guides_explanations/{starting_from_scratch.rst => starting_from_scratch.md} (69%) rename docs/source/{index.rst => index.md} (62%) create mode 100644 docs/source/programming_languages/index.md delete mode 100644 docs/source/programming_languages/index.rst create mode 100644 docs/source/zreferences.md delete mode 100644 docs/source/zreferences.rst diff --git a/CHANGES.md b/CHANGES.md new file mode 100644 index 00000000..5e716d55 --- /dev/null +++ b/CHANGES.md @@ -0,0 +1,40 @@ +# Release Notes + +## v0.5 -- January 2022 + +- Move to plotly (#92, @timmens) +- Cleaning up (@hmgaudecker) + +## v0.4 -- January 2021 + +- Move from Waf to Pytask (#86, @tobiasraabe, @hmgaudecker) +- Move to GitHub Actions for CI (@janosg, WIP) + +## v0.3 -- October 2019 + +- Much improved documentation (@raholler) +- Extensive instructions for use on Windows (@raholler) +- Re-use previously-entered data when cookiecutter fails + (@tobiasraabe, @raholler) +- Fix Stata template by setting --shell-escape=1 (#63, @raholler) +- Add pyupgrade to pre-commit hooks (#59) +- Thanks to students at LMU for pointing lots of this out! + +## v0.2 -- September 2019 + +- Full continuous integration testing on the Azure platform +- R example completely working in Miniconda environment out of the + box (@raholler) +- Documentation for Stata / R examples (@raholler) +- Much improved instructions for usage on Windows (@raholler) +- Improved structure of docs + +## v0.1 -- October 2018 + +- First version with cookiecutter (thanks, @tobiasraabe + and @julienschat) +- All the stuff that accumulated over the years with the help of many. + I wish my memory was better so I would be able to list the + contributions separately. Thanks, @PKEuS, @philippmuller, + @julienschat, @janosg, @tdrerup and many more who provided feedback! diff --git a/CHANGES.rst b/CHANGES.rst deleted file mode 100644 index 8306b9aa..00000000 --- a/CHANGES.rst +++ /dev/null @@ -1,42 +0,0 @@ -Release Notes -************* - -v0.5 -- January 2022 ------------------------ - -* Move to plotly (#92, @timmens) -* Cleaning up (@hmgaudecker) - -v0.4 -- January 2021 ------------------------ - -* Move from Waf to Pytask (#86, @tobiasraabe, @hmgaudecker) -* Move to GitHub Actions for CI (@janosg, WIP) - - -v0.3 -- October 2019 ------------------------ - -* Much improved documentation (@raholler) -* Extensive instructions for use on Windows (@raholler) -* Re-use previously-entered data when cookiecutter fails (@tobiasraabe, @raholler) -* Fix Stata template by setting `--shell-escape=1` (#63, @raholler) -* Add pyupgrade to pre-commit hooks (#59) -* Thanks to students at LMU for pointing lots of this out! - - -v0.2 -- September 2019 ------------------------ - -* Full continuous integration testing on the Azure platform -* R example completely working in Miniconda environment out of the box (@raholler) -* Documentation for Stata / R examples (@raholler) -* Much improved instructions for usage on Windows (@raholler) -* Improved structure of docs - - -v0.1 -- October 2018 ---------------------- - -* First version with cookiecutter (thanks, @tobiasraabe and @julienschat) -* All the stuff that accumulated over the years with the help of many. I wish my memory was better so I would be able to list the contributions separately. Thanks, @PKEuS, @philippmuller, @julienschat, @janosg, @tdrerup and many more who provided feedback! diff --git a/README.md b/README.md new file mode 100644 index 00000000..a184d725 --- /dev/null +++ b/README.md @@ -0,0 +1,36 @@ +# Templates for Reproducible Research Projects in Economics + +![MIT license](https://img.shields.io/github/license/OpenSourceEconomics/econ-project-templates) +[![image](https://zenodo.org/badge/14557543.svg)](https://zenodo.org/badge/latestdoi/14557543) +[![Documentation Status](https://readthedocs.org/projects/econ-project-templates/badge/?version=stable)](https://econ-project-templates.readthedocs.io/en/stable/) +[![image](https://github.com/OpenSourceEconomics/econ-project-templates/actions/workflows/continuous-integration-workflow.yml/badge.svg)](https://github.com/OpenSourceEconomics/econ-project-templates/actions/workflows/continuous-integration-workflow.yml) +[![image](https://codecov.io/gh/OpenSourceEconomics/econ-project-templates/branch/master/graph/badge.svg)](https://codecov.io/gh/OpenSourceEconomics/econ-project-templates) +[![pre-commit.ci status](https://results.pre-commit.ci/badge/github/OpenSourceEconomics/econ-project-templates/master.svg)](https://results.pre-commit.ci/latest/github/OpenSourceEconomics/econ-project-templates/master) +[![image](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black) + +This project aims to provide project templates for economists that make it easy to +produce reproducible research using one or more of the most frequently used programming +languages in economics (i.e Python, R, Julia, Stata). + +Users and curious visitors please take a look at the +[documentation](https://econ-project-templates.readthedocs.io/en/stable/). This +repository is for developing the templates rather than using them. + +## Contributing + +We welcome suggestions on anything: improving the documentation, bug reports, feature +requests. Please open an +[issue](https://github.com/OpenSourceEconomics/econ-project-templates/issues) in these +cases. + +If you want to work on a specific feature, we are more than happy to get you started! +Please [get in touch briefly](https://www.wiwi.uni-bonn.de/gaudecker/personal_cv.html), +this is a small team so there is no need for a detailed formal process. + +## Contributors + +@hmgaudecker @timmens @tobiasraabe + +## Former Contributors + +@janosg @PKEuS @philippmuller @julienschat @raholler diff --git a/README.rst b/README.rst deleted file mode 100644 index 73f423db..00000000 --- a/README.rst +++ /dev/null @@ -1,55 +0,0 @@ -Templates for Reproducible Research Projects in Economics -=========================================================== - - -.. image:: https://img.shields.io/github/license/OpenSourceEconomics/econ-project-templates - :alt: MIT license - -.. image:: https://zenodo.org/badge/14557543.svg - :target: https://zenodo.org/badge/latestdoi/14557543 - -.. image:: https://readthedocs.org/projects/econ-project-templates/badge/?version=stable - :target: https://econ-project-templates.readthedocs.io/en/stable/ - :alt: Documentation Status - -.. image:: https://github.com/OpenSourceEconomics/econ-project-templates/actions/workflows/continuous-integration-workflow.yml/badge.svg - :target: https://github.com/OpenSourceEconomics/econ-project-templates/actions/workflows/continuous-integration-workflow.yml - -.. image:: https://codecov.io/gh/OpenSourceEconomics/econ-project-templates/branch/master/graph/badge.svg - :target: https://codecov.io/gh/OpenSourceEconomics/econ-project-templates - -.. image:: https://results.pre-commit.ci/badge/github/OpenSourceEconomics/econ-project-templates/master.svg - :target: https://results.pre-commit.ci/latest/github/OpenSourceEconomics/econ-project-templates/master - :alt: pre-commit.ci status - -.. image:: https://img.shields.io/badge/code%20style-black-000000.svg - :target: https://github.com/psf/black - - -This project aims to provide project templates for economists that make it easy to produce reproducible research using one or more of the most frequently used programming languages in economics (i.e Python, R, Julia, Stata). - -Users and curious visitors please take a look at the `documentation `_. This repository is for developing the templates rather than using them. - -Contributing -------------- - -We welcome suggestions on anything: improving the documentation, bug reports, feature requests. Please open an `issue `__ in these cases. - -If you want to work on a specific feature, we are more than happy to get you started! Please `get in touch briefly `__, this is a small team so there is no need for a detailed formal process. - - -Contributors -------------- - -@hmgaudecker -@timmens -@tobiasraabe - -Former Contributors -------------------- - -@janosg -@PKEuS -@philippmuller -@julienschat -@raholler diff --git a/docs/source/background/dag.rst b/docs/source/background/dag.md similarity index 55% rename from docs/source/background/dag.rst rename to docs/source/background/dag.md index 9c731aa0..2bc74828 100644 --- a/docs/source/background/dag.rst +++ b/docs/source/background/dag.md @@ -1,16 +1,18 @@ -The way to specify dependencies between data, code and tasks to perform for a -computer is a directed acyclic graph. A graph is simply a set of nodes (files, -in our case) and edges that connect pairs of nodes (tasks to perform). Directed -means that the order of how we connect a pair of nodes matters, we thus add -arrows to all edges. Acyclic means that there are no directed cycles: When you -traverse a graph in the direction of the arrows, there may not be a way to end -up at the same node again. +The way to specify dependencies between data, code and tasks to perform for a computer +is a directed acyclic graph. A graph is simply a set of nodes (files, in our case) and +edges that connect pairs of nodes (tasks to perform). Directed means that the order of +how we connect a pair of nodes matters, we thus add arrows to all edges. Acyclic means +that there are no directed cycles: When you traverse a graph in the direction of the +arrows, there may not be a way to end up at the same node again. This is the dependency graph of the example project (open the image in a different window to zoom in) -.. figure:: ../figures/dag.png - :width: 50em +```{figure} ../figures/dag.png +--- +width: 50em +--- +``` The nodes have different shapes in order to distinguish tasks from files. The rectangles denote targets or dependencies like figures, data sets or stored models. The hexagons @@ -19,19 +21,22 @@ dependency structure can be complex. In a first run, all targets have to be generated, of course. In later runs, a target only needs to be re-generated if one of its direct **dependencies** changes. E.g. when -we alter ``paper/research_pres_30min.tex`` (mid-right) we need to rebuild only the -presentation pdf file. If we alter ``rrt/data_management/data_info.yaml`` (top-right), +we alter `paper/research_pres_30min.tex` (mid-right) we need to rebuild only the +presentation pdf file. If we alter `rrt/data_management/data_info.yaml` (top-right), however, we need to rebuild everything. Note, that the only important thing at this point is to understand the general idea. Of course this is overkill for a simple example -- we could easily keep the code closer together than this. But such a strategy does not scale to serious papers with many different specifications. As a case in point, consider the DAG for an early version of -:cite:`Gaudecker2015`: +{cite}`Gaudecker2015`: -.. figure:: ../figures/pfefficiency.jpg - :width: 50em +```{figure} ../figures/pfefficiency.jpg +--- +width: 50em +--- +``` -Do you want to keep those dependencies in your head? Or would it be useful to -specify them once and for all in order to have more time for thinking about -research? The next section shows you how to do that. +Do you want to keep those dependencies in your head? Or would it be useful to specify +them once and for all in order to have more time for thinking about research? The next +section shows you how to do that. diff --git a/docs/source/background/design_rationale.rst b/docs/source/background/design_rationale.md similarity index 58% rename from docs/source/background/design_rationale.rst rename to docs/source/background/design_rationale.md index 1792492c..3ab6f25b 100644 --- a/docs/source/background/design_rationale.rst +++ b/docs/source/background/design_rationale.md @@ -1,12 +1,12 @@ The design of the project templates is guided by the following main thoughts: -#. **Separation of logical chunks:** A minimal requirement for a project to scale. -#. **Only execute required tasks, automatically:** Again required for scalability. It +1. **Separation of logical chunks:** A minimal requirement for a project to scale. +1. **Only execute required tasks, automatically:** Again required for scalability. It means that the machine needs to know what is meant by a "required task". -#. **Re-use of code and data instead of copying and pasting:** Else you will forget the +1. **Re-use of code and data instead of copying and pasting:** Else you will forget the copy & paste step at some point down the road. At best, this leads to errors; at worst, to misinterpreting the results. -#. **Be as language-agnostic as possible:** Make it easy to use the best tool for a +1. **Be as language-agnostic as possible:** Make it easy to use the best tool for a particular task and to mix tools in a project. -#. **Separation of inputs and outputs:** Required to find your way around in a complex +1. **Separation of inputs and outputs:** Required to find your way around in a complex project. diff --git a/docs/source/background/directory_structure.rst b/docs/source/background/directory_structure.md similarity index 59% rename from docs/source/background/directory_structure.rst rename to docs/source/background/directory_structure.md index f539bf5b..64287cfd 100644 --- a/docs/source/background/directory_structure.rst +++ b/docs/source/background/directory_structure.md @@ -1,23 +1,23 @@ -The big picture -=============== +### The big picture The following graph shows the contents of the example project root directory after -executing ``pytask``: +executing `pytask`: -.. figure:: ../figures/generated/root_bld_src.png - :width: 45em +```{figure} ../figures/generated/root_bld_src.png +--- +width: 45em +--- +``` Files and directories in yellow are constructed by pytask; those with a bluish background are added directly by the researcher. You immediately see the **separation of inputs** and outputs (one of our guiding principles) at work: -- All source code is in the src directory -- All outputs are constructed in the bld directory - -.. note:: - - The paper and presentation are moved to the root so they can be opened easily +- All source code is in the src directory +- All outputs are constructed in the bld directory +```{note} The paper and presentation are moved to the root so they can be opened easily +``` The contents of both the root/bld and the root/src directories directly follow the steps of the analysis from the workflow section. @@ -27,26 +27,27 @@ specified in root/src/analysis and all its output is placed in root/bld/analysis Some differences: -- Because they are accessed frequently, figures and the like get extra directories in - root/bld - -- The directory root/src contains many more subdirectories and files: +- Because they are accessed frequently, figures and the like get extra directories in + root/bld - - utilities.py provides code that may be used by different steps of the project. - Little code snippets for input / output or stuff that is not directly related to - the model would go here. +- The directory root/src contains many more subdirectories and files: + - utilities.py provides code that may be used by different steps of the project. + Little code snippets for input / output or stuff that is not directly related to the + model would go here. -Zooming in -========== +### Zooming in Lets go one step deeper and consider the root/src directory in more detail: -.. figure:: ../figures/generated/src.png - :width: 40em +```{figure} ../figures/generated/src.png +--- +width: 40em +--- +``` It is imperative that you do all the task handling inside the `task_xxx.py`-scripts, -using the `pathlib `_ library. This ensures that +using the [pathlib](https://realpython.com/python-pathlib/) library. This ensures that your project can be used on different machines and it minimises the potential for cross-platform errors. diff --git a/docs/source/background/index.md b/docs/source/background/index.md new file mode 100644 index 00000000..810a3870 --- /dev/null +++ b/docs/source/background/index.md @@ -0,0 +1,54 @@ +(background)= + +# Background + +This section contains explanations on why the project templates look the way they do. +This includes a short explanation of the content of the pre-installed example, the basic +design rationale, discussion of the workflow, the directory structure we chose, and a +little background on directed acyclic graphs and pytask. There is not much reference to +code or a particular programming language here, this is relegated to the next section. + +(running_example)= + +## Running example + +```{include} running_example.md +``` + +(design_rationale)= + +## Design Rationale + +```{include} design_rationale.md + +``` + +(workflow)= + +## How to Organize the Workflow? + +```{include} workflow.md +``` + +(directory_structure)= + +## Directory Structure + +```{include} directory_structure.md + +``` + +(dag)= + +## Directed Acyclic Graphs + +```{include} dag.md + +``` + +(pytask)= + +## Introduction to pytask + +```{include} pytask.md +``` diff --git a/docs/source/background/index.rst b/docs/source/background/index.rst deleted file mode 100644 index 45e51610..00000000 --- a/docs/source/background/index.rst +++ /dev/null @@ -1,58 +0,0 @@ -.. _background: - -Background -########## - -This section contains explanations on why the project templates look the way they do. -This includes a short explanation of the content of the pre-installed example, the basic -design rationale, discussion of the workflow, the directory structure we chose, and a -little background on directed acyclic graphs and pytask. There is not much reference to -code or a particular programming language here, this is relegated to the next section. - - -.. _running_example: - -Running example -*************** - -.. include:: running_example.rst - - -.. _design_rationale: - -Design Rationale -**************** - -.. include:: design_rationale.rst - - -.. _workflow: - -How to Organize the Workflow? -***************************** - -.. include:: workflow.rst - - -.. _directory_structure: - -Directory Structure -******************* - -.. include:: directory_structure.rst - - -.. _dag: - -Directed Acyclic Graphs -*********************** - -.. include:: dag.rst - - -.. _pytask: - -Introduction to pytask -********************** - -.. include:: pytask.rst diff --git a/docs/source/background/pytask.md b/docs/source/background/pytask.md new file mode 100644 index 00000000..f425b2ab --- /dev/null +++ b/docs/source/background/pytask.md @@ -0,0 +1,44 @@ +[pytask](https://pytask-dev.readthedocs.io) is our tool of choice to automate the +dependency tracking via a DAG (directed acyclic graph) structure. It has been written by +Uni Bonn alumnus [Tobias Raabe](https://github.com/tobiasraabe) out of frustration with +other tools. + +pytask is inspired by pytest and leverages the same plugin system. If you are familiar +with pytest, getting started with pytask should be a very smooth process. + +pytask will look for Python scripts named `task_[specifier].py` in all subdirectories of +your project. Within those scripts, it will execute functions that start with `task_`. + +Have a look at its excellent [documentation](https://pytask-dev.readthedocs.io). At +present, there are additional plugins to run +[R scripts](https://github.com/pytask-dev/pytask-r), +[Julia scripts](https://github.com/pytask-dev/pytask-julia), +[Stata do-files](https://github.com/pytask-dev/pytask-stata), and to compile +[documents via LaTeX](https://github.com/pytask-dev/pytask-latex). + +We will have more to say about the directory structure in the {ref}`directory_structure` +section. For now, we note that a step towards achieving the goal of clearly separating +inputs and outputs is that we specify a separate build directory. All output files go +there (including intermediate output), it is never kept under version control, and it +can be safely removed -- everything in it will be reconstructed automatically the next +time you run `pytask`. + +### Pytask Overview + +From a high-level perspective, pytask works in the following way: + +1. pytask reads your instructions and sets the build order. + + - Think of a dependency graph here. + - pytask stops when it detects a circular dependency or ambiguous ways to build a + target (e.g., you specify the same target twice). + - Both are major advantages over a *workflow script*, let alone doing the dependency + tracking in your mind. + +1. pytask decides which tasks need to be executed and performs the required actions. + + - Minimal rebuilds are a huge speed gain compared to a *workflow script*. + - These gains are large enough to make projects break or succeed. + +We have just touched upon the tip of the iceberg here; pytask has many more goodies to +offer. Its [documentation](https://pytask-dev.readthedocs.io) is an excellent source. diff --git a/docs/source/background/pytask.rst b/docs/source/background/pytask.rst deleted file mode 100644 index 88e53138..00000000 --- a/docs/source/background/pytask.rst +++ /dev/null @@ -1,47 +0,0 @@ -`pytask `__ is our tool of choice to automate the -dependency tracking via a DAG (directed acyclic graph) structure. It has been written by -Uni Bonn alumnus `Tobias Raabe `_ out of frustration -with other tools. - -pytask is inspired by pytest and leverages the same plugin system. If you are familiar -with pytest, getting started with pytask should be a very smooth process. - -pytask will look for Python scripts named `task_[specifier].py` in all subdirectories of -your project. Within those scripts, it will execute functions that start with `task_`. - -Have a look at its excellent `documentation `_. At -present, there are additional plugins to run `R scripts -`_, `Julia scripts -`_, `Stata do-files -`_, and to compile `documents via LaTeX -`_. - -We will have more to say about the directory structure in the :ref:`directory_structure` -section. For now, we note that a step towards achieving the goal of clearly separating -inputs and outputs is that we specify a separate build directory. All output files go -there (including intermediate output), it is never kept under version control, and it -can be safely removed -- everything in it will be reconstructed automatically the next -time you run `pytask`. - - -Pytask Overview -=============== - -From a high-level perspective, pytask works in the following way: - -#. pytask reads your instructions and sets the build order. - - * Think of a dependency graph here. - * pytask stops when it detects a circular dependency or ambiguous ways to build a - target (e.g., you specify the same target twice). - * Both are major advantages over a *workflow script*, let alone doing the dependency - tracking in your mind. - - -#. pytask decides which tasks need to be executed and performs the required actions. - - * Minimal rebuilds are a huge speed gain compared to a *workflow script*. - * These gains are large enough to make projects break or succeed. - -We have just touched upon the tip of the iceberg here; pytask has many more goodies to -offer. Its `documentation `_ is an excellent source. diff --git a/docs/source/background/running_example.rst b/docs/source/background/running_example.md similarity index 51% rename from docs/source/background/running_example.rst rename to docs/source/background/running_example.md index 0d5bac0c..b963e1c8 100644 --- a/docs/source/background/running_example.rst +++ b/docs/source/background/running_example.md @@ -1,28 +1,28 @@ The example project that will be installed with the templates is a simple empirical project. Its abstract might read: - This paper estimates the probability of smoking given age, marital status, and level - of education. We use the stats4schools `Smoking dataset - `_ - and run a logistic regression. Results are presented in this paper; you may also - want to consult the accompanying slides. - +> This paper estimates the probability of smoking given age, marital status, and level +> of education. We use the stats4schools +> [Smoking dataset](https://www.stem.org.uk/resources/elibrary/resource/28452/large-datasets-stats4schools) +> and run a logistic regression. Results are presented in this paper; you may also want +> to consult the accompanying slides. We can translate this into tasks our code needs to perform: -1. Clean the data -2. Estimate a logistic model -3. For each of the categorical variables, predict the smoking propensity over the lifetime -4. Visualize the results -5. Create tables with the results -6. Include the results in documents for dissemination +1. Clean the data +1. Estimate a logistic model +1. For each of the categorical variables, predict the smoking propensity over the + lifetime +1. Visualize the results +1. Create tables with the results +1. Include the results in documents for dissemination In these templates, we categorize these tasks into four groups: -* Data Management: task 1 -* Analysis: tasks 2 & 3 -* Final: tasks 4 & 5 -* Paper: task 6 +- Data Management: task 1 +- Analysis: tasks 2 & 3 +- Final: tasks 4 & 5 +- Paper: task 6 Naturally, different projects have different needs. E.g., for a simulation study, you might want to discard the data management part. Doing so is trivial by just deleting the diff --git a/docs/source/background/workflow.rst b/docs/source/background/workflow.md similarity index 81% rename from docs/source/background/workflow.rst rename to docs/source/background/workflow.md index 1d94714b..490ffbf8 100644 --- a/docs/source/background/workflow.rst +++ b/docs/source/background/workflow.md @@ -3,30 +3,39 @@ A naive way to ensure reproducibility is to have a *workflow script* (do-file, m setup would be to have code for each step of the analysis and a loop over both categorical variables within each step: -.. figure:: ../figures/generated/steps_only_full.png - :width: 35em +```{figure} ../figures/generated/steps_only_full.png +--- +width: 35em +--- +``` You will still need to manually keep track of whether you need to run a particular step after making changes, though. Or you run everything at once, all the time. Alternatively, you may have code that runs one step after the other for each variable: -.. figure:: ../figures/generated/model_steps_full.png - :width: 35em +```{figure} ../figures/generated/model_steps_full.png +--- +width: 35em +--- +``` The equivalent comment applies here: Either keep track of which model needs to be run after making changes manually, or run everything at once. Ideally though, you want to be even more fine-grained than this and only run individual elements. This is particularly true when your entire computations take some time. In -this case, running all steps every time via the *workflow script* simply is not an option. -All my research projects ended up running for a long time, no matter how simple they -were... +this case, running all steps every time via the *workflow script* simply is not an +option. All my research projects ended up running for a long time, no matter how simple +they were... The figure shows you that even in this simple example, there are now quite a few parts to remember: -.. figure:: ../figures/generated/model_steps_select.png - :width: 35em +```{figure} ../figures/generated/model_steps_select.png +--- +width: 35em +--- +``` This figure assumes that your data management is being done for all models at once, which is usually a good choice for me. Nevertheless, we need to remember 6 ways to start diff --git a/docs/source/conf.py b/docs/source/conf.py index aeda6081..f75a8f20 100644 --- a/docs/source/conf.py +++ b/docs/source/conf.py @@ -50,7 +50,7 @@ ] # MyST -myst_enable_extensions = ["colon_fence", "deflist", "dollarmath"] +myst_enable_extensions = [] autoapi_dirs = ["../../hooks"] @@ -96,7 +96,11 @@ # The suffix(es) of source filenames. # You can specify multiple suffix as a list of string: # -source_suffix = ".rst" +source_suffix = { + ".rst": "restructuredtext", + ".txt": "restructuredtext", + ".md": "markdown", +} # The master toctree document. master_doc = "index" diff --git a/docs/source/development/changes.md b/docs/source/development/changes.md new file mode 100644 index 00000000..5569a397 --- /dev/null +++ b/docs/source/development/changes.md @@ -0,0 +1,3 @@ +```{eval-rst} +.. include:: ../../../CHANGES.md +``` diff --git a/docs/source/development/changes.rst b/docs/source/development/changes.rst deleted file mode 100644 index 525b47cf..00000000 --- a/docs/source/development/changes.rst +++ /dev/null @@ -1 +0,0 @@ -.. include:: ../../../CHANGES.rst diff --git a/docs/source/development/index.md b/docs/source/development/index.md new file mode 100644 index 00000000..cb005096 --- /dev/null +++ b/docs/source/development/index.md @@ -0,0 +1,12 @@ +# Development + +## How to contribute + +```{include} how-to-contribute.md +--- +parser: myst_parser.sphinx_ +--- +``` + +```{include} changes.md +``` diff --git a/docs/source/development/index.rst b/docs/source/development/index.rst deleted file mode 100644 index ed609e11..00000000 --- a/docs/source/development/index.rst +++ /dev/null @@ -1,11 +0,0 @@ -Development -=========== - -How to contribute -***************** - -.. include:: how-to-contribute.md - :parser: myst_parser.sphinx_ - - -.. include:: changes.rst diff --git a/docs/source/faq.md b/docs/source/faq.md new file mode 100644 index 00000000..871ca4a3 --- /dev/null +++ b/docs/source/faq.md @@ -0,0 +1,168 @@ +(faq)= + +# FAQ + +(windows_user)= + +## Tips and Tricks for Windows Users + +**Anaconda Installation Notes for Windows Users** + +Please follow these steps unless you know what you are doing. + +1. Download the [Graphical Installer](https://www.anaconda.com/distribution/#windows) + for Python 3.x. + +1. Start the installer and click yourself through the menu. If you have administrator + privileges on your computer, it is preferable to install Anaconda for all users. + Otherwise, you may run into problems when running python from your powershell. + +1. Make sure to (only) tick the following box: + + - ''Register Anaconda as my default Python 3.x''. Finish installation. + +1. Navigate to the folder containing your Anaconda distribution. This folder contains + multiple subfolders. Please add the path to the folder called `condabin` to your + *PATH* environmental variable. This path should end in `Anaconda3/condabin`. You can + add paths to your *PATH* by following these + [instructions](https://www.computerhope.com/issues/ch000549.htm). + +1. Please start Windows Powershell in administrator mode, and execute the following: + + ```bash + $ set-executionpolicy remotesigned + ``` + +1. Now (re-)open Windows Powershell and initialize it for full conda use by running + + ```bash + $ conda init + ``` + +```{warning} If you still run into problems when running conda and python from +powershell, it is advisable to use the built-in Anaconda Prompt instead. +``` + +(git_windows)= + +### Integrating git tab completion in Windows Powershell + +Powershell does not support tab completion for git automatically. However, there is a +nice utility called [posh-git](https://github.com/dahlbyk/posh-git). We advise you to +install this as this makes your life easier. + +(path_windows)= + +### PATH environmental variable in Windows + +In Windows, one has to oftentimes add the programs manually to the *PATH* environmental +variable in the Advanced System Settings. See +[here](https://www.computerhope.com/issues/ch000549.htm) for a detailed explanation of +how to do that. + +(path_mac)= + +### Adding directories to the PATH: MacOS and Linux + +Open the program **Terminal**. You will need to add a line to the file `.bash_profile` +and potentially create the file. This file lives in your home directory, in the Finder +it is hidden from your view by default. + +**Linux users**: For most distributions, everything here applies to the file `.bashrc` +instead of `.bash_profile`. + +I will now provide a step-by-step guide of how to create / adjust this file using the +editor called `code`. If you are familiar with editing text files, just use your editor +of choice. + +1. Open a Terminal and type + + ```bash + code ~/.bash_profile + ``` + + If you use an editor other than [VS Code](https://code.visualstudio.com/), replace + `code` by the respective editor. + + If `.bash_profile` already existed, you will see some text at this point. If so, use + the arrow keys to scroll all the way to the bottom of the file. + +1. Add the following line at the end of the file + + ```bash + export PATH="${PATH}:/path/to/program/inside/package" + ``` + + You will need to follow the same steps as before. Example for Stata: + + ```bash + # Stata directory + export PATH="${PATH}:/Applications/Stata/StataMP.app/Contents/MacOS/" + ``` + + In `/Applications/Stata/StataMP.app`, you may need to replace bits and pieces as + appropriate for your installation (e.g. you might not have StataMP but StataSE). + + Similarly for Matlab or the likes. + +1. Press `Return` and then `ctrl+o` (= WriteOut = save) and `Return` once more. + +(cookiecutter_trouble)= + +### When cookiecutter exits with an error + +If cookiecutter fails, you will get a lengthy error message. It is important that you +work through this and try to understand the error (the language used might seem funny, +but it is precise...). + +Then type: + +```bash +$ code ~/.cookiecutter_replay/econ-project-templates-0.5.1.json +``` + +If you are not using VS Code as your editor of choice, adjust the line accordingly. + +This command should open your editor and show you a json file containing your answers to +the previously filled out dialogue. You can fix your faulty settings in this file. If +you have spaces or special characters in your path, you need to adjust your path. + +When done, launch a new shell if necessary and type: + +```bash +$ cookiecutter --replay https://github.com/OpenSourceEconomics/econ-project-templates/archive/v0.5.1.zip +``` + +(stata_failure_check_erase_log_file)= + +### Stata failure: FileNotFoundError + +The following failure: + +``` +FileNotFoundError: No such file or directory: '/Users/xxx/econ/econ project templates/bld/add_variables.log' +``` + +has a simple solution: **Get rid of all spaces in the path to the project.** (i.e., +`econ-project-templates` instead of `econ project templates` in this case). To do so, do +**not** rename your user directory, that will cause havoc. Rather move the project +folder to a different location. + +I have not been able to get Stata working with spaces in the path in batch mode, so this +has nothing to do with Python or pytask. If anybody finds a solution, please let me +know. + +### Stata failure: missing file + +If you see an error like this one: + +``` +-> missing file: '/Users/xxx/econ/econ-project/templates/bld/add_variables.log' +``` + +check that you have a license for the Stata version that is found (the Stata tool just +checks availability top-down, i.e., MP-SE-IC, in case an MP-Version is found and you +just have a license for SE, Stata will silently refuse to start up). + +The solution is to remove all versions of Stata from its executable directory (e.g., +/usr/local/stata) that cost more than your license did. diff --git a/docs/source/faq.rst b/docs/source/faq.rst deleted file mode 100644 index 242b14a1..00000000 --- a/docs/source/faq.rst +++ /dev/null @@ -1,177 +0,0 @@ -.. _faq: - -FAQ -=== - -.. _windows_user: - -Tips and Tricks for Windows Users -********************************* - -**Anaconda Installation Notes for Windows Users** - -Please follow these steps unless you know what you are doing. - -1. Download the `Graphical Installer `_ - for Python 3.x. - -2. Start the installer and click yourself through the menu. If you have administrator - privileges on your computer, it is preferable to install Anaconda for all users. - Otherwise, you may run into problems when running python from your powershell. - -3. Make sure to (only) tick the following box: - - - ''Register Anaconda as my default Python 3.x''. Finish installation. - -4. Navigate to the folder containing your Anaconda distribution. This folder contains - multiple subfolders. Please add the path to the folder called `condabin` to your - *PATH* environmental variable. This path should end in `Anaconda3/condabin`. You can - add paths to your *PATH* by following these `instructions - `_. - -5. Please start Windows Powershell in administrator mode, and execute the following: - - .. code-block:: bash - - $ set-executionpolicy remotesigned - -6. Now (re-)open Windows Powershell and initialize it for full conda use by running - - .. code-block:: bash - - $ conda init - -.. warning:: - - If you still run into problems when running conda and python from powershell, it is - advisable to use the built-in Anaconda Prompt instead. - -.. _git_windows: - -Integrating git tab completion in Windows Powershell ----------------------------------------------------- - -Powershell does not support tab completion for git automatically. However, there is a -nice utility called `posh-git `_. We advise you to -install this as this makes your life easier. - -.. _path_windows: - -PATH environmental variable in Windows --------------------------------------- - -In Windows, one has to oftentimes add the programs manually to the *PATH* environmental -variable in the Advanced System Settings. See `here -`_ for a detailed explanation of how -to do that. - -.. _path_mac: - -Adding directories to the PATH: MacOS and Linux ------------------------------------------------ - -Open the program **Terminal**. You will need to add a line to the file ``.bash_profile`` -and potentially create the file. This file lives in your home directory, in the Finder -it is hidden from your view by default. - -**Linux users**: For most distributions, everything here applies to the file ``.bashrc`` -instead of ``.bash_profile``. - -I will now provide a step-by-step guide of how to create / adjust this file using the -editor called ``code``. If you are familiar with editing text files, just use your -editor of choice. - -#. Open a Terminal and type - - .. code-block:: bash - - code ~/.bash_profile - - If you use an editor other than `VS Code `_, replace - ``code`` by the respective editor. - - If ``.bash_profile`` already existed, you will see some text at this point. If so, - use the arrow keys to scroll all the way to the bottom of the file. - - -#. Add the following line at the end of the file - - .. code-block:: bash - - export PATH="${PATH}:/path/to/program/inside/package" - - You will need to follow the same steps as before. Example for Stata: - - .. code-block:: bash - - # Stata directory - export PATH="${PATH}:/Applications/Stata/StataMP.app/Contents/MacOS/" - - In ``/Applications/Stata/StataMP.app``, you may need to replace bits and pieces as - appropriate for your installation (e.g. you might not have StataMP but StataSE). - - Similarly for Matlab or the likes. - -#. Press ``Return`` and then ``ctrl+o`` (= WriteOut = save) and ``Return`` once more. - - -.. _cookiecutter_trouble: - -When cookiecutter exits with an error -------------------------------------- - -If cookiecutter fails, you will get a lengthy error message. It is important that you -work through this and try to understand the error (the language used might seem funny, -but it is precise...). - -Then type: - -.. code-block:: bash - - $ code ~/.cookiecutter_replay/econ-project-templates-0.5.1.json - -If you are not using VS Code as your editor of choice, adjust the line accordingly. - -This command should open your editor and show you a json file containing your answers to -the previously filled out dialogue. You can fix your faulty settings in this file. If -you have spaces or special characters in your path, you need to adjust your path. - -When done, launch a new shell if necessary and type: - -.. code-block:: bash - - $ cookiecutter --replay https://github.com/OpenSourceEconomics/econ-project-templates/archive/v0.5.1.zip - - -.. _stata_failure_check_erase_log_file: - -Stata failure: FileNotFoundError --------------------------------- - -The following failure:: - - FileNotFoundError: No such file or directory: '/Users/xxx/econ/econ project templates/bld/add_variables.log' - -has a simple solution: **Get rid of all spaces in the path to the project.** (i.e., -``econ-project-templates`` instead of ``econ project templates`` in this case). To do -so, do **not** rename your user directory, that will cause havoc. Rather move the -project folder to a different location. - -I have not been able to get Stata working with spaces in the path in batch mode, so this -has nothing to do with Python or pytask. If anybody finds a solution, please let me -know. - - -Stata failure: missing file ---------------------------- - -If you see an error like this one:: - - -> missing file: '/Users/xxx/econ/econ-project/templates/bld/add_variables.log' - -check that you have a license for the Stata version that is found (the Stata tool just -checks availability top-down, i.e., MP-SE-IC, in case an MP-Version is found and you -just have a license for SE, Stata will silently refuse to start up). - -The solution is to remove all versions of Stata from its executable directory (e.g., -/usr/local/stata) that cost more than your license did. diff --git a/docs/source/getting_started/cookiecutter_dialogue.rst b/docs/source/getting_started/cookiecutter_dialogue.md similarity index 69% rename from docs/source/getting_started/cookiecutter_dialogue.rst rename to docs/source/getting_started/cookiecutter_dialogue.md index bd1770cc..c279f964 100644 --- a/docs/source/getting_started/cookiecutter_dialogue.rst +++ b/docs/source/getting_started/cookiecutter_dialogue.md @@ -3,13 +3,13 @@ Navigate to the parent folder of your future project and type (i.e., copy & paste): - .. code-block:: console + ```console + $ cookiecutter https://github.com/OpenSourceEconomics/econ-project-templates/archive/v0.5.1.zip + ``` - $ cookiecutter https://github.com/OpenSourceEconomics/econ-project-templates/archive/v0.5.1.zip - -2. The dialogue will move you through the installation. **Make sure to keep this page +1. The dialogue will move you through the installation. **Make sure to keep this page side-by-side during the process because if something is invalid, the whole process - will break off** (see :ref:`cookiecutter_trouble` on how to recover from there, but + will break off** (see {ref}`cookiecutter_trouble` on how to recover from there, but no need to push it). *Note that if you don't know how to answer a question, it's usually best to accept the default.* @@ -65,24 +65,24 @@ **add_r_example** -- Whether to create the example project using the r programming language. - .. warning:: - The R example project is currently under construction. Help is appreciated! - Selecting this option only installs R related packages, including pytask-R, to - the environment, and adds R related hooks to .pre-commit-config.yaml. + ```{warning} The R example project is currently under construction. Help is + appreciated! Selecting this option only installs R related packages, including + pytask-R, to the environment, and adds R related hooks to .pre-commit-config.yaml. + ``` **add_julia_example** -- Whether to create the example project using the julia programming language. - .. warning:: - The Julia example project is not implemented yet. Help is appreciated! Selecting - this option only installs pytask-Julia to the environment. + ```{warning} The Julia example project is not implemented yet. Help is appreciated! + Selecting this option only installs pytask-Julia to the environment. + ``` **add_stata_example** -- Whether to create the example project using the stata programming language. - .. warning:: - The Stata example project is not implemented yet. Help is appreciated! Selecting - this option only installs pytask-Stata to the environment. + ```{warning} The Stata example project is not implemented yet. Help is appreciated! + Selecting this option only installs pytask-Stata to the environment. + ``` **conda_environment_name** -- Name of your conda environment. This should not be too long, since you need to type it often. @@ -93,41 +93,39 @@ After successfully answering all the prompts, a folder named according to your project_slug will be created in your current directory. If you run into trouble, - please follow the steps explained :ref:`cookiecutter_trouble` - - -3. **Skip this step if you did not opt for the conda environment.** Type: + please follow the steps explained {ref}`cookiecutter_trouble` - .. code-block:: console +1. **Skip this step if you did not opt for the conda environment.** Type: - $ conda activate + ```console + $ conda activate + ``` This will activate the newly created conda environment. You have to repeat the last step anytime you want to run your project from a new terminal window. -4. Pre-commit hooks have to be installed in order for them to have an effect. This step +1. Pre-commit hooks have to be installed in order for them to have an effect. This step has to be repeated every time you work on your project **on a new machine**. To install the pre-commit hooks, navigate to the project's folder in the shell and type: - .. code-block:: console + ```console + $ pre-commit install + ``` - $ pre-commit install - -5. Navigate to the folder in the shell and type the following commands into your command +1. Navigate to the folder in the shell and type the following commands into your command line to see whether the examples are working: - .. code-block:: console - - $ pytask + ```console + $ pytask + ``` - .. - maybe show how it should look if everything works + % maybe show how it should look if everything works All programs used within this project template need to be found on your path, see - above (:ref:`preparing_your_system` and the :ref:`faq`). + above ({ref}`preparing_your_system` and the {ref}`faq`). If all went well, you are now ready to adapt the template to your project. -Depending on what your needs are, move on with the section on :ref:`starting a project -from scratch ` or on :ref:`porting an existing project -`. +Depending on what your needs are, move on with the section on +{ref}`starting a project from scratch ` or on +{ref}`porting an existing project `. diff --git a/docs/source/getting_started/index.md b/docs/source/getting_started/index.md new file mode 100644 index 00000000..4a362dde --- /dev/null +++ b/docs/source/getting_started/index.md @@ -0,0 +1,40 @@ +(getting_started)= + +# Getting Started + +How to get started depends on what your system looks like. Depending on that, you may +want to jump to any of the sections in this part of the documentation: + +- In {ref}`preparing_your_system`, we describe what needs to be installed on your + computer so that you can use the templates. +- Should you have done that already for a different project, you can directly go to + {ref}`cookiecutter_dialogue`, which describes the options you have when moving from + the template to your specific research project. +- In case a project has been set up by you or a collaborator and you want to use it on a + different machine as well, you will find the explanations on how to do so in + {ref}`second_machine`. + +Once you are set up in this fashion, you may want to read up on the background of the +{ref}`background`. In case you know those already, have a look in +{ref}`guides_explanations` for guides on starting new projects or porting existing ones. + +(preparing_your_system)= + +## Preparing your system + +```{include} preparing_your_system.md +``` + +(cookiecutter_dialogue)= + +## Customising the template for your needs + +```{include} cookiecutter_dialogue.md +``` + +(second_machine)= + +## How to get started on a second machine + +```{include} second_machine.md +``` diff --git a/docs/source/getting_started/index.rst b/docs/source/getting_started/index.rst deleted file mode 100644 index 97e18308..00000000 --- a/docs/source/getting_started/index.rst +++ /dev/null @@ -1,43 +0,0 @@ -.. _getting_started: - -Getting Started -=============== - -How to get started depends on what your system looks like. Depending on that, you may -want to jump to any of the sections in this part of the documentation: - -- In :ref:`preparing_your_system`, we describe what needs to be installed on your - computer so that you can use the templates. -- Should you have done that already for a different project, you can directly go to - :ref:`cookiecutter_dialogue`, which describes the options you have when moving from - the template to your specific research project. -- In case a project has been set up by you or a collaborator and you want to use it on a - different machine as well, you will find the explanations on how to do so in - :ref:`second_machine`. - -Once you are set up in this fashion, you may want to read up on the background of the -:ref:`background`. In case you know those already, have a look in -:ref:`guides_explanations` for guides on starting new projects or porting existing ones. - - -.. _preparing_your_system: - -Preparing your system -********************* - -.. include:: preparing_your_system.rst - - -.. _cookiecutter_dialogue: - -Customising the template for your needs -*************************************** - -.. include:: cookiecutter_dialogue.rst - -.. _second_machine: - -How to get started on a second machine -************************************** - -.. include:: second_machine.rst diff --git a/docs/source/getting_started/preparing_your_system.md b/docs/source/getting_started/preparing_your_system.md new file mode 100644 index 00000000..05c7742c --- /dev/null +++ b/docs/source/getting_started/preparing_your_system.md @@ -0,0 +1,142 @@ +### Program installation + +1. Make sure you have the following programs installed and that these can be found on + your path. This template requires + +- [Miniconda](http://conda.pydata.org/miniconda.html) or Anaconda. Windows users: please + consult {ref}`windows_user` + + ```{note} This template is tested with python 3.7 and higher and conda version 4.7.12 + and higher. Use conda 4.6-4.7.11 at your own risk; conda versions 4.5 and below will + not work under any circumstances. + ``` + +- a modern LaTeX distribution (e.g. [TeXLive](https://tug.org/texlive/), + [MacTex](http://tug.org/mactex), or [MikTex](http://miktex.org)) + +- [Git](https://git-scm.com/downloads), windows users please also consult + {ref}`git_windows` + +- The text editor [VS Code](https://code.visualstudio.com/), unless you know what you + are doing. + +### Validating the installation paths + +2. If you are on Windows, please open the Windows Powershell. On Mac or Linux, open a + terminal. As everything will be started from the Powershell/Terminal, you need to + make sure that all programmes you need in your project (for sure Anaconda Python, + Git, and LaTeX; potentially VS Code, R, Julia, Stata) can be found on your *PATH*. + That is, these need to be accessible from your shell. This often requires a bit of + manual work, in particular on Windows. + +- To see which programmes can be found on your path, type (leave out the leading dollar + sign, this is just standard notation for a command line prompt): + + Windows + + ```powershell + $ echo $env:path + ``` + + Mac/Linux + + ```console + $ echo $PATH + ``` + + This gives you a list of directories that are available on your *PATH*. + +- Check that this list contains the path to the programs you want to use in your + project, in particular, Anaconda (this contains your Python distribution), a LaTeX + distribution, the text editor VS Code, Git, and any other program that you need for + your project (R, Julia, Stata). Otherwise add them by looking up their paths on your + computer and follow the steps described here {ref}`path_windows` or {ref}`path_mac`. + +- If you added any directory to *PATH*, you need to close and reopen your shell, so that + this change is implemented. + +- To be on the safe side regarding your paths, you can check directly whether you can + launch the programmes. For Python, type: + + ```console + $ python + >>> exit() + ``` + + This starts python in your shell and exits from it again. The top line should indicate + that you are using a Python distribution provided by Anaconda. Here is an example + output obtained using Windows PowerShell: + + ```text + Python 3.9.9 | packaged by conda-forge | (main, Dec 20 2021, 02:40:17) + [GCC 9.4.0] on linux + Type "help", "copyright", "credits" or "license" for more information. + ``` + + For Git, type: + + ```console + $ git status + ``` + + Unless you are in a location where you expect a Git repository, this should yield the + output: + + ```console + fatal: not a git repository (or any of the parent directories): .git + ``` + + If a Git repository is present, delete it or go to another directory before starting + cookiecutter below. + + To start and exit pdflatex. + + ```console + $ pdflatex + $ X + ``` + + An editor window should open after typing: + + ```console + $ code + ``` + + If required, do the same for R, Julia or Stata — see + {ref}`here ` for the precise commands you may + need. + +### Validating Git + +3. In the Powershell/Terminal, navigate to the parent folder of your future project. + + Now type `pwd`, which prints the absolute path to your present working directory. + **There must not be any spaces or special characters in the path** (for instance ä, + ü, é, Chinese or Cyrillic characters). + + If you have any spaces or special characters on your path, change to a folder that + does not have these special characters (e.g., on Windows, create a directory + `C:\projects`. Do **not** rename your home directory). + + Type `git status` , this should yield the output: + + ```console + fatal: not a git repository (or any of the parent directories): .git + ``` + +### Installing cookiecutter + +4. The template uses [cookiecutter](https://cookiecutter.readthedocs.io/en/latest/) to + enable personalized installations. Before you start, install cookiecutter on your + system. + + ```console + $ pip install cookiecutter + ``` + + All additional dependencies will be installed into a newly created conda environment + upon project creation. + + ```{warning} If you do not opt for the conda environment later on, you need to take + care of these dependencies by yourself. + ``` diff --git a/docs/source/getting_started/preparing_your_system.rst b/docs/source/getting_started/preparing_your_system.rst deleted file mode 100644 index e2753500..00000000 --- a/docs/source/getting_started/preparing_your_system.rst +++ /dev/null @@ -1,151 +0,0 @@ -Program installation --------------------- - -1. Make sure you have the following programs installed and that these can be found on - your path. This template requires - -- `Miniconda `_ or Anaconda. Windows users: - please consult :ref:`windows_user` - - .. note:: - - This template is tested with python 3.7 and higher and conda version 4.7.12 - and higher. Use conda 4.6-4.7.11 at your own risk; conda versions 4.5 and - below will not work under any circumstances. - -- a modern LaTeX distribution (e.g. `TeXLive `_, `MacTex - `_, or `MikTex `_) - -- `Git `_, windows users please also consult - :ref:`git_windows` - -- The text editor `VS Code `_, unless you know what - you are doing. - - -Validating the installation paths ---------------------------------- - -2. If you are on Windows, please open the Windows Powershell. On Mac or Linux, open a - terminal. As everything will be started from the Powershell/Terminal, you need to - make sure that all programmes you need in your project (for sure Anaconda Python, - Git, and LaTeX; potentially VS Code, R, Julia, Stata) can be found on your *PATH*. - That is, these need to be accessible from your shell. This often requires a bit of - manual work, in particular on Windows. - -- To see which programmes can be found on your path, type (leave out the leading dollar - sign, this is just standard notation for a command line prompt): - - Windows - - .. code-block:: powershell - - $ echo $env:path - - Mac/Linux - - .. code-block:: console - - $ echo $PATH - - This gives you a list of directories that are available on your *PATH*. - -- Check that this list contains the path to the programs you want to use in your - project, in particular, Anaconda (this contains your Python distribution), a LaTeX - distribution, the text editor VS Code, Git, and any other program that you need for - your project (R, Julia, Stata). Otherwise add them by looking up their paths on your - computer and follow the steps described here :ref:`path_windows` or :ref:`path_mac`. - -- If you added any directory to *PATH*, you need to close and reopen your shell, so - that this change is implemented. - -- To be on the safe side regarding your paths, you can check directly whether you - can launch the programmes. For Python, type: - - .. code-block:: console - - $ python - >>> exit() - - This starts python in your shell and exits from it again. The top line should - indicate that you are using a Python distribution provided by Anaconda. Here is an - example output obtained using Windows PowerShell: - - .. code-block:: text - - Python 3.9.9 | packaged by conda-forge | (main, Dec 20 2021, 02:40:17) - [GCC 9.4.0] on linux - Type "help", "copyright", "credits" or "license" for more information. - - For Git, type: - - .. code-block:: console - - $ git status - - Unless you are in a location where you expect a Git repository, this should yield the - output: - - .. code-block:: console - - fatal: not a git repository (or any of the parent directories): .git - - If a Git repository is present, delete it or go to another directory before starting - cookiecutter below. - - To start and exit pdflatex. - - .. code-block:: console - - $ pdflatex - $ X - - An editor window should open after typing: - - .. code-block:: console - - $ code - - If required, do the same for R, Julia or Stata — see :ref:`here - ` for the precise commands you may need. - - -Validating Git --------------- - -3. In the Powershell/Terminal, navigate to the parent folder of your future project. - - Now type ``pwd``, which prints the absolute path to your present working directory. - **There must not be any spaces or special characters in the path** (for instance ä, - ü, é, Chinese or Cyrillic characters). - - If you have any spaces or special characters on your path, change to a folder that - does not have these special characters (e.g., on Windows, create a directory - ``C:\projects``. Do **not** rename your home directory). - - Type ``git status`` , this should yield the output: - - .. code-block:: console - - fatal: not a git repository (or any of the parent directories): .git - - -Installing cookiecutter ------------------------ - - -4. The template uses `cookiecutter `_ - to enable personalized installations. Before you start, install cookiecutter on your - system. - - .. code-block:: console - - $ pip install cookiecutter - - All additional dependencies will be installed into a newly created conda environment - upon project creation. - - .. warning:: - - If you do not opt for the conda environment later on, you need to take care of - these dependencies by yourself. diff --git a/docs/source/getting_started/second_machine.rst b/docs/source/getting_started/second_machine.md similarity index 63% rename from docs/source/getting_started/second_machine.rst rename to docs/source/getting_started/second_machine.md index f1b11b3c..0089c311 100644 --- a/docs/source/getting_started/second_machine.rst +++ b/docs/source/getting_started/second_machine.md @@ -5,14 +5,12 @@ need to go through the cookiecutter dialogue etc. On the second machine prepare the system and open a terminal on Max/Linux or the Anaconda prompt on Windows. Then type - -.. code-block:: console - - $ git clone - $ cd - $ conda env create -f environment.yml - $ conda activate - $ pre-commit install - +```console +$ git clone +$ cd +$ conda env create -f environment.yml +$ conda activate +$ pre-commit install +``` Now your're all set! diff --git a/docs/source/guides_explanations/environments.rst b/docs/source/guides_explanations/environments.md similarity index 55% rename from docs/source/guides_explanations/environments.rst rename to docs/source/guides_explanations/environments.md index bffe5814..4d3dd7c4 100644 --- a/docs/source/guides_explanations/environments.rst +++ b/docs/source/guides_explanations/environments.md @@ -3,18 +3,15 @@ time and spending the first {hours, days} updating your code to work with a new of your favourite data analysis library. The same holds for debugging errors that occur only because your coauthor uses a slightly different setup. -The solution is to have isolated environments on a per-project basis. `Conda -environments -`_ +The solution is to have isolated environments on a per-project basis. +[Conda environments](https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html) allow you to do precisely this. This page describes them a little bit and explains their use. The following commands can either be executed in a terminal or the Anaconda prompt (Windows). - -Using the environment ---------------------- +### Using the environment In the installation process of the template a new environment was created if it was not explicitly declined. It took its specification from the environment.yml file in your @@ -22,82 +19,71 @@ projects root folder. To activate it, execute: -.. code:: console - - $ conda activate +```console +$ conda activate +``` Repeat this step every time you want to run your project from a new terminal window. - -Setting up a new environment ----------------------------- +### Setting up a new environment If you want to create a clean environment we recommended specifying it through an environment.yml file. Below we show the contents of an example environment.yml file. A -detailed explanation is given in the `Conda documentation -`_. - -.. code:: yaml - - name: +detailed explanation is given in the +[Conda documentation](https://conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html#create-env-file-manually). - channels: - - conda-forge - - defaults +```yaml +name: - dependencies: - - python=3.10 - - numpy - - pandas - - pip - - pip: - - black +channels: + - conda-forge + - defaults +dependencies: + - python=3.10 + - numpy + - pandas + - pip + - pip: + - black +``` If the environment.yml file exists you can create the environment using -.. code:: console - - $ conda create -f path/to/environment.yml - - -Updating packages ------------------ - -Make sure you activated the environment by ``conda activate ``. Then run +```console +$ conda create -f path/to/environment.yml +``` -.. code:: console +### Updating packages - $ conda update [package] +Make sure you activated the environment by `conda activate `. Then run +```console +$ conda update [package] +``` -to update a specific ``[package]``, or run - -.. code:: console - - $ conda update --all +to update a specific `[package]`, or run +```console +$ conda update --all +``` to update all packages. - -Installing additional packages ------------------------------- +### Installing additional packages To list installed packages, activate the environment and type -.. code:: console - - $ conda list - +```console +$ conda list +``` If you want to add a package to your environment, add it to the environment.yml file. Once you have edited the environment.yml file, run -.. code:: console - - $ conda env update -f environment.yml - +```console +$ conda env update -f environment.yml +``` **Choosing between conda and pip** @@ -106,18 +92,16 @@ scientific packages. These often are not pure-Python code and pip is built mainl that. For pure-Python packages, sometimes nobody bothered to set up a conda package and we use *pip*. -If you add a package under ``dependencies:`` in the environment.yml file, conda will try -to install its own package. If you add a package under ``pip:``, conda will try to -install the package via pip. +If you add a package under `dependencies:` in the environment.yml file, conda will try +to install its own package. If you add a package under `pip:`, conda will try to install +the package via pip. - -Information about your conda environments ------------------------------------------ +### Information about your conda environments For listing your installed conda environments, type -.. code:: console - - $ conda info --envs +```console +$ conda info --envs +``` The currently activated one will be marked. diff --git a/docs/source/guides_explanations/hooks.rst b/docs/source/guides_explanations/hooks.md similarity index 50% rename from docs/source/guides_explanations/hooks.rst rename to docs/source/guides_explanations/hooks.md index 8b7ec02b..2e2f2042 100644 --- a/docs/source/guides_explanations/hooks.rst +++ b/docs/source/guides_explanations/hooks.md @@ -4,26 +4,22 @@ the issues raised by the hooks. Pre-commit hooks are defined in the *.pre-commit-config.yaml*. The example project contains most hooks you will need. Below we present three common hooks. Note that some hooks are programming language agnostic while others work on a specific language. You can find a list of most hooks in the -`pre-commit documentation `_ under Supported hooks. +[pre-commit documentation](https://pre-commit.com/index.html) under Supported hooks. - -- `black `_: Reformats your python code according to a +- [black](https://github.com/psf/black): Reformats your python code according to a universal standard. Blackened code looks the same regardless of the project you're reading. Having black as a hook allows you to focus on the content while writing code and let the formatting be done automatically before each commit. - -- `check-yaml `_: Checks whether all - .yaml and .yml files within your project are valid yaml files. Similarly, having - check-yaml as a hook allows you to focus on the content while writing yaml files. - If you accidentally use a wrong syntax this hook will tell you before you commit. - -- `codespell `_: Fixes common - misspellings in text files. It's designed primarily for checking misspelled words in - source code, but it can be used with other files as well. - +- [check-yaml](https://github.com/pre-commit/pre-commit-hooks): Checks whether all .yaml + and .yml files within your project are valid yaml files. Similarly, having check-yaml + as a hook allows you to focus on the content while writing yaml files. If you + accidentally use a wrong syntax this hook will tell you before you commit. +- [codespell](https://github.com/codespell-project/codespell): Fixes common misspellings + in text files. It's designed primarily for checking misspelled words in source code, + but it can be used with other files as well. If you want to skip the pre-commit hooks for a particular commit, you can run: -.. code-block:: console - - $ git commit -am --no-verify +```console +$ git commit -am --no-verify +``` diff --git a/docs/source/guides_explanations/index.md b/docs/source/guides_explanations/index.md new file mode 100644 index 00000000..a5c8f9a5 --- /dev/null +++ b/docs/source/guides_explanations/index.md @@ -0,0 +1,41 @@ +(guides_explanations)= + +# Guides and Explanations + +This section contains some guides that have proven useful before when you are +{ref}`starting a project from scratch ` or +{ref}`porting an existing project `. + +In case you are unsure about the use(fulness) of {ref}`environments` and +{ref}`pre_commit_hooks` you will find concise explanations below. + +(starting_from_scratch)= + +## Starting a new project from scratch + +```{include} starting_from_scratch.md + +``` + +(porting_existing_project)= + +## Porting an existing project + +```{include} porting_existing_project.md + +``` + +(environments)= + +## Conda Environments + +```{include} environments.md + +``` + +(pre_commit_hooks)= + +## Pre-Commit Hooks + +```{include} hooks.md +``` diff --git a/docs/source/guides_explanations/index.rst b/docs/source/guides_explanations/index.rst deleted file mode 100644 index 2b7553f1..00000000 --- a/docs/source/guides_explanations/index.rst +++ /dev/null @@ -1,43 +0,0 @@ -.. _guides_explanations: - -Guides and Explanations -======================= - -This section contains some guides that have proven useful before when you are :ref:`starting a -project from scratch ` or :ref:`porting an existing project -`. - -In case you are unsure about the use(fulness) of :ref:`environments` and -:ref:`pre_commit_hooks` you will find concise explanations below. - - -.. _starting_from_scratch: - -Starting a new project from scratch -*********************************** - -.. include:: starting_from_scratch.rst - - -.. _porting_existing_project: - -Porting an existing project -*************************** - -.. include:: porting_existing_project.rst - - -.. _environments: - -Conda Environments -****************** - -.. include:: environments.rst - - -.. _pre_commit_hooks: - -Pre-Commit Hooks -**************** - -.. include:: hooks.rst diff --git a/docs/source/guides_explanations/porting_existing_project.rst b/docs/source/guides_explanations/porting_existing_project.md similarity index 61% rename from docs/source/guides_explanations/porting_existing_project.rst rename to docs/source/guides_explanations/porting_existing_project.md index a1546da1..f4920e2a 100644 --- a/docs/source/guides_explanations/porting_existing_project.rst +++ b/docs/source/guides_explanations/porting_existing_project.md @@ -3,14 +3,14 @@ thinking in computer science / software engineering terms, it will be hard to wr head around all of the things that are going on. So move one bit of code at a time to the template, understand what is happening and why, and move on. -#. Assuming that you use Git, first move all the code in the existing project to a +1. Assuming that you use Git, first move all the code in the existing project to a subdirectory called old_code. Commit. -#. Now set up the templates. -#. Start with the data management code and move your data files to the spot where they +1. Now set up the templates. +1. Start with the data management code and move your data files to the spot where they belong under the new structure. -#. Move (the first steps of) your data management code to the folder under the +1. Move (the first steps of) your data management code to the folder under the templates. Modify the `task_xxx` files accordingly or create new ones. -#. Run `pytask`, adjusting the code for the errors you'll likely see. -#. Move on step-by-step like this. -#. Delete the example files and the corresponding sections of the `task_xxx` files / the +1. Run `pytask`, adjusting the code for the errors you'll likely see. +1. Move on step-by-step like this. +1. Delete the example files and the corresponding sections of the `task_xxx` files / the entire files in case you created new ones. diff --git a/docs/source/guides_explanations/starting_from_scratch.rst b/docs/source/guides_explanations/starting_from_scratch.md similarity index 69% rename from docs/source/guides_explanations/starting_from_scratch.rst rename to docs/source/guides_explanations/starting_from_scratch.md index 94bdf93a..70ad4687 100644 --- a/docs/source/guides_explanations/starting_from_scratch.rst +++ b/docs/source/guides_explanations/starting_from_scratch.md @@ -4,12 +4,12 @@ head around all of the things that are going on. So write one bit of code at a t understand what is happening and why, and move on. Assuming you have installed the template for the language(s) of your choice as described -in :ref:`cookiecutter_dialogue`, my recommendation would be as follows. +in {ref}`cookiecutter_dialogue`, my recommendation would be as follows. -#. Leave the examples in place. -#. Now add your own data and code bit by bit. **Append** the `task_xxx` files as +1. Leave the examples in place. +1. Now add your own data and code bit by bit. **Append** the `task_xxx` files as necessary or create new ones. -#. Remove the build directory regularly to make sure you do not rely on outputs from +1. Remove the build directory regularly to make sure you do not rely on outputs from tasks that do not exist any more — this is a frequent source of confusion. -#. Once you got the hang of how things work, remove the examples (both the data files +1. Once you got the hang of how things work, remove the examples (both the data files and the code in the `task_xxx` files). Also remove the build directory. diff --git a/docs/source/index.rst b/docs/source/index.md similarity index 62% rename from docs/source/index.rst rename to docs/source/index.md index 40bb91b2..f0f785a1 100644 --- a/docs/source/index.rst +++ b/docs/source/index.md @@ -1,10 +1,8 @@ -Templates for Reproducible Research: Documentation -################################################## +# Templates for Reproducible Research: Documentation -.. _introduction: +(introduction)= -Introduction -============ +## Introduction An empirical or computational research project only becomes a useful building block for science and policy when all steps can be easily repeated and modified by others. @@ -23,43 +21,43 @@ This code base aims to provide two stepping stones to assist you in achieving th structure time and again, which typically happens when incrementally building up a new project. Put differently, instead of starting from scratch, you modify an example for your needs. -2. A pre-configured instance of `pytask `_, which +1. A pre-configured instance of [pytask](https://pytask-dev.readthedocs.io), which facilitates the reproducibility of your research findings from the beginning to the end by letting the computer handle the project's workflow. The first should lure you in quickly. The second should convince you to stick to the tools in the long run – unless you have fought with large research projects before, at this point you nay think that all of this is overkill and far more difficult than -necessary. It is not. *[although I am always* `happy to hear -`_ *about easier alternatives]* +necessary. It is not. *\[although I am always* +[happy to hear](https://www.wiwi.uni-bonn.de/gaudecker/) *about easier alternatives\]* The templates support a variety of programming languages already. They can be easily -extended to cover others. Everything is tied together by `pytask -`_, which is written in `Python -`_. You do not need to know a lot of Python to use these tools, -though. +extended to cover others. Everything is tied together by +[pytask](https://pytask-dev.readthedocs.io), which is written in +[Python](http://www.python.org/). You do not need to know a lot of Python to use these +tools, though. +## Navigating this Documentation -Navigating this Documentation -============================= - +```{eval-rst} .. todo:: Complete +``` If you are a complete novice, you should read carefully through the entire documents. We -suggest starting with the section :ref:`getting_started`. Once you've finished that we -recommend reading the :ref:`background` section. - - - -.. toctree:: - :maxdepth: 1 - - getting_started/index - background/index - guides_explanations/index - programming_languages/index - faq - development/index - zreferences +suggest starting with the section {ref}`getting_started`. Once you've finished that we +recommend reading the {ref}`background` section. + +```{toctree} +--- +maxdepth: 1 +--- +getting_started/index +background/index +guides_explanations/index +programming_languages/index +faq +development/index +zreferences +``` diff --git a/docs/source/programming_languages/index.md b/docs/source/programming_languages/index.md new file mode 100644 index 00000000..349fd986 --- /dev/null +++ b/docs/source/programming_languages/index.md @@ -0,0 +1,81 @@ +(programming_languages)= + +# Programming Languages + +The templates support a variety of programming languages. + +- Python +- R +- Julia +- Stata + +The base language is Python, which works out-of-the-box. In this section we show you how +to use the other languages and explain some language specific caveats. + +```{note} +- When selecting a language in the cookiecutter {ref}`cookiecutter_dialogue` +we install all the necessary software needed to use that language for you. +- The usage of pytask with your chosen language should be illustrated in the +example project that was downloaded. At the moment the example project is not +implemented for R, Julia and Stata (but under more or less active development, help +appreciated!). This is why we clarify the basics here. +``` + +```{warning} The use of pytask with Python differs from the other languages. While in +Python you do certain manipulations of your objects inside a task-file, in the other +languages you only specify dependencies and outputs. +``` + +## R + +The following is copied from [pytask-r](https://github.com/pytask-dev/pytask-r). + +To create a task which runs a R script, define a task function with the `@pytask.mark.r` +decorator. The `script` keyword provides an absolute path or path relative to the task +module to the R script. + +```python +import pytask + + +@pytask.mark.r(script="script.r") +@pytask.mark.produces("out.rds") +def task_run_r_script(): + pass +``` + +## Julia + +The following is copied from [pytask-julia](https://github.com/pytask-dev/pytask-julia). + +To create a task which runs a Julia script, define a task function with the +`@pytask.mark.julia` decorator. The `script` keyword provides an absolute path or path +relative to the task module to the Julia script. + +```python +import pytask + + +@pytask.mark.julia(script="script.jl") +@pytask.mark.produces("out.csv") +def task_run_jl_script(): + pass +``` + +## Stata + +The following is copied from [pytask-stata](https://github.com/pytask-dev/pytask-stata). + +To create a task which runs a Stata script, define a task function with the +`@pytask.mark.stata` decorator. The `script` keyword provides an absolute path or path +relative to the task module to the Stata script. + +```python +import pytask + + +@pytask.mark.stata(script="script.do") +@pytask.mark.produces("out.dta") +def task_run_do_script(): + pass +``` diff --git a/docs/source/programming_languages/index.rst b/docs/source/programming_languages/index.rst deleted file mode 100644 index 242d300e..00000000 --- a/docs/source/programming_languages/index.rst +++ /dev/null @@ -1,93 +0,0 @@ -.. _programming_languages: - -Programming Languages -===================== - -The templates support a variety of programming languages. - -- Python -- R -- Julia -- Stata - -The base language is Python, which works out-of-the-box. In this section we show you how -to use the other languages and explain some language specific caveats. - -.. note:: - - When selecting a language in the cookiecutter :ref:`cookiecutter_dialogue` we - install all the necessary software needed to use that language for you. - -.. note:: - - The usage of pytask with your chosen language should be illustrated in the example - project that was downloaded. At the moment the example project is not implemented - for R, Julia and Stata (but under more or less active development, help - appreciated!). This is why we clarify the basics here. - -.. warning:: - - The use of pytask with Python differs from the other languages. While in Python you - do certain manipulations of your objects inside a task-file, in the other languages - you only specify dependencies and outputs. - -R -* - -The following is copied from `pytask-r `_. - -To create a task which runs a R script, define a task function with the `@pytask.mark.r` -decorator. The `script` keyword provides an absolute path or path relative to the task -module to the R script. - -.. code-block:: python - - import pytask - - - @pytask.mark.r(script="script.r") - @pytask.mark.produces("out.rds") - def task_run_r_script(): - pass - - -Julia -***** - -The following is copied from `pytask-julia -`_. - -To create a task which runs a Julia script, define a task function with the -`@pytask.mark.julia` decorator. The `script` keyword provides an absolute path or path -relative to the task module to the Julia script. - -.. code-block:: python - - import pytask - - - @pytask.mark.julia(script="script.jl") - @pytask.mark.produces("out.csv") - def task_run_jl_script(): - pass - - -Stata -***** - -The following is copied from `pytask-stata -`_. - -To create a task which runs a Stata script, define a task function with the -`@pytask.mark.stata` decorator. The `script` keyword provides an absolute path or path -relative to the task module to the Stata script. - -.. code-block:: python - - import pytask - - - @pytask.mark.stata(script="script.do") - @pytask.mark.produces("out.dta") - def task_run_do_script(): - pass diff --git a/docs/source/zreferences.md b/docs/source/zreferences.md new file mode 100644 index 00000000..fd77d73a --- /dev/null +++ b/docs/source/zreferences.md @@ -0,0 +1,5 @@ +# References + +```{eval-rst} +.. bibliography:: refs.bib +``` diff --git a/docs/source/zreferences.rst b/docs/source/zreferences.rst deleted file mode 100644 index f2153d73..00000000 --- a/docs/source/zreferences.rst +++ /dev/null @@ -1,4 +0,0 @@ -References -========== - -.. bibliography:: refs.bib