Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert Documentation from reST to Markdown #117

Merged
merged 11 commits into from Nov 29, 2022
40 changes: 40 additions & 0 deletions CHANGES.md
@@ -0,0 +1,40 @@
# Release Notes

## v0.5 -- January 2022

- Move to plotly (#92, @timmens)
- Cleaning up (@hmgaudecker)

## v0.4 -- January 2021

- Move from Waf to Pytask (#86, @tobiasraabe, @hmgaudecker)
- Move to GitHub Actions for CI (@janosg, WIP)

## v0.3 -- October 2019

- Much improved documentation (@raholler)
- Extensive instructions for use on Windows (@raholler)
- Re-use previously-entered data when cookiecutter fails
(@tobiasraabe, @raholler)
- Fix Stata template by setting <span
class="title-ref">--shell-escape=1</span> (#63, @raholler)
- Add pyupgrade to pre-commit hooks (#59)
- Thanks to students at LMU for pointing lots of this out!

## v0.2 -- September 2019

- Full continuous integration testing on the Azure platform
- R example completely working in Miniconda environment out of the
box (@raholler)
- Documentation for Stata / R examples (@raholler)
- Much improved instructions for usage on Windows (@raholler)
- Improved structure of docs

## v0.1 -- October 2018

- First version with cookiecutter (thanks, @tobiasraabe
and @julienschat)
- All the stuff that accumulated over the years with the help of many.
I wish my memory was better so I would be able to list the
contributions separately. Thanks, @PKEuS, @philippmuller,
@julienschat, @janosg, @tdrerup and many more who provided feedback!
42 changes: 0 additions & 42 deletions CHANGES.rst

This file was deleted.

36 changes: 36 additions & 0 deletions README.md
@@ -0,0 +1,36 @@
# Templates for Reproducible Research Projects in Economics

![MIT license](https://img.shields.io/github/license/OpenSourceEconomics/econ-project-templates)
[![image](https://zenodo.org/badge/14557543.svg)](https://zenodo.org/badge/latestdoi/14557543)
[![Documentation Status](https://readthedocs.org/projects/econ-project-templates/badge/?version=stable)](https://econ-project-templates.readthedocs.io/en/stable/)
[![image](https://github.com/OpenSourceEconomics/econ-project-templates/actions/workflows/continuous-integration-workflow.yml/badge.svg)](https://github.com/OpenSourceEconomics/econ-project-templates/actions/workflows/continuous-integration-workflow.yml)
[![image](https://codecov.io/gh/OpenSourceEconomics/econ-project-templates/branch/master/graph/badge.svg)](https://codecov.io/gh/OpenSourceEconomics/econ-project-templates)
[![pre-commit.ci status](https://results.pre-commit.ci/badge/github/OpenSourceEconomics/econ-project-templates/master.svg)](https://results.pre-commit.ci/latest/github/OpenSourceEconomics/econ-project-templates/master)
[![image](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)

This project aims to provide project templates for economists that make it easy to
produce reproducible research using one or more of the most frequently used programming
languages in economics (i.e Python, R, Julia, Stata).

Users and curious visitors please take a look at the
[documentation](https://econ-project-templates.readthedocs.io/en/stable/). This
repository is for developing the templates rather than using them.

## Contributing

We welcome suggestions on anything: improving the documentation, bug reports, feature
requests. Please open an
[issue](https://github.com/OpenSourceEconomics/econ-project-templates/issues) in these
cases.

If you want to work on a specific feature, we are more than happy to get you started!
Please [get in touch briefly](https://www.wiwi.uni-bonn.de/gaudecker/personal_cv.html),
this is a small team so there is no need for a detailed formal process.

## Contributors

@hmgaudecker @timmens @tobiasraabe

## Former Contributors

@janosg @PKEuS @philippmuller @julienschat @raholler
55 changes: 0 additions & 55 deletions README.rst

This file was deleted.

39 changes: 22 additions & 17 deletions docs/source/background/dag.rst → docs/source/background/dag.md
@@ -1,16 +1,18 @@
The way to specify dependencies between data, code and tasks to perform for a
computer is a directed acyclic graph. A graph is simply a set of nodes (files,
in our case) and edges that connect pairs of nodes (tasks to perform). Directed
means that the order of how we connect a pair of nodes matters, we thus add
arrows to all edges. Acyclic means that there are no directed cycles: When you
traverse a graph in the direction of the arrows, there may not be a way to end
up at the same node again.
The way to specify dependencies between data, code and tasks to perform for a computer
is a directed acyclic graph. A graph is simply a set of nodes (files, in our case) and
edges that connect pairs of nodes (tasks to perform). Directed means that the order of
how we connect a pair of nodes matters, we thus add arrows to all edges. Acyclic means
that there are no directed cycles: When you traverse a graph in the direction of the
arrows, there may not be a way to end up at the same node again.

This is the dependency graph of the example project (open the image in a different
window to zoom in)

.. figure:: ../figures/dag.png
:width: 50em
```{figure} ../figures/dag.png
---
width: 50em
---
```

The nodes have different shapes in order to distinguish tasks from files. The rectangles
denote targets or dependencies like figures, data sets or stored models. The hexagons
Expand All @@ -19,19 +21,22 @@ dependency structure can be complex.

In a first run, all targets have to be generated, of course. In later runs, a target
only needs to be re-generated if one of its direct **dependencies** changes. E.g. when
we alter ``paper/research_pres_30min.tex`` (mid-right) we need to rebuild only the
presentation pdf file. If we alter ``rrt/data_management/data_info.yaml`` (top-right),
we alter `paper/research_pres_30min.tex` (mid-right) we need to rebuild only the
presentation pdf file. If we alter `rrt/data_management/data_info.yaml` (top-right),
however, we need to rebuild everything. Note, that the only important thing at this
point is to understand the general idea.

Of course this is overkill for a simple example -- we could easily keep the code closer
together than this. But such a strategy does not scale to serious papers with many
different specifications. As a case in point, consider the DAG for an early version of
:cite:`Gaudecker2015`:
{cite}`Gaudecker2015`:

.. figure:: ../figures/pfefficiency.jpg
:width: 50em
```{figure} ../figures/pfefficiency.jpg
---
width: 50em
---
```

Do you want to keep those dependencies in your head? Or would it be useful to
specify them once and for all in order to have more time for thinking about
research? The next section shows you how to do that.
Do you want to keep those dependencies in your head? Or would it be useful to specify
them once and for all in order to have more time for thinking about research? The next
section shows you how to do that.
@@ -1,12 +1,12 @@
The design of the project templates is guided by the following main thoughts:

#. **Separation of logical chunks:** A minimal requirement for a project to scale.
#. **Only execute required tasks, automatically:** Again required for scalability. It
1. **Separation of logical chunks:** A minimal requirement for a project to scale.
1. **Only execute required tasks, automatically:** Again required for scalability. It
means that the machine needs to know what is meant by a "required task".
#. **Re-use of code and data instead of copying and pasting:** Else you will forget the
1. **Re-use of code and data instead of copying and pasting:** Else you will forget the
copy & paste step at some point down the road. At best, this leads to errors; at
worst, to misinterpreting the results.
#. **Be as language-agnostic as possible:** Make it easy to use the best tool for a
1. **Be as language-agnostic as possible:** Make it easy to use the best tool for a
particular task and to mix tools in a project.
#. **Separation of inputs and outputs:** Required to find your way around in a complex
1. **Separation of inputs and outputs:** Required to find your way around in a complex
project.
@@ -1,23 +1,23 @@
The big picture
===============
### The big picture

The following graph shows the contents of the example project root directory after
executing ``pytask``:
executing `pytask`:

.. figure:: ../figures/generated/root_bld_src.png
:width: 45em
```{figure} ../figures/generated/root_bld_src.png
---
width: 45em
---
```

Files and directories in yellow are constructed by pytask; those with a bluish
background are added directly by the researcher. You immediately see the **separation of
inputs** and outputs (one of our guiding principles) at work:

- All source code is in the src directory
- All outputs are constructed in the bld directory

.. note::

The paper and presentation are moved to the root so they can be opened easily
- All source code is in the src directory
- All outputs are constructed in the bld directory

```{note} The paper and presentation are moved to the root so they can be opened easily
```

The contents of both the root/bld and the root/src directories directly follow the steps
of the analysis from the workflow section.
Expand All @@ -27,26 +27,27 @@ specified in root/src/analysis and all its output is placed in root/bld/analysis

Some differences:

- Because they are accessed frequently, figures and the like get extra directories in
root/bld

- The directory root/src contains many more subdirectories and files:
- Because they are accessed frequently, figures and the like get extra directories in
root/bld

- utilities.py provides code that may be used by different steps of the project.
Little code snippets for input / output or stuff that is not directly related to
the model would go here.
- The directory root/src contains many more subdirectories and files:

- utilities.py provides code that may be used by different steps of the project.
Little code snippets for input / output or stuff that is not directly related to the
model would go here.

Zooming in
==========
### Zooming in

Lets go one step deeper and consider the root/src directory in more detail:

.. figure:: ../figures/generated/src.png
:width: 40em
```{figure} ../figures/generated/src.png
---
width: 40em
---
```

It is imperative that you do all the task handling inside the `task_xxx.py`-scripts,
using the `pathlib <https://realpython.com/python-pathlib/>`_ library. This ensures that
using the [pathlib](https://realpython.com/python-pathlib/) library. This ensures that
your project can be used on different machines and it minimises the potential for
cross-platform errors.

Expand Down
54 changes: 54 additions & 0 deletions docs/source/background/index.md
@@ -0,0 +1,54 @@
(background)=

# Background

This section contains explanations on why the project templates look the way they do.
This includes a short explanation of the content of the pre-installed example, the basic
design rationale, discussion of the workflow, the directory structure we chose, and a
little background on directed acyclic graphs and pytask. There is not much reference to
code or a particular programming language here, this is relegated to the next section.

(running_example)=

## Running example

```{include} running_example.md
```

(design_rationale)=

## Design Rationale

```{include} design_rationale.md

```

(workflow)=

## How to Organize the Workflow?

```{include} workflow.md
```

(directory_structure)=

## Directory Structure

```{include} directory_structure.md

```

(dag)=

## Directed Acyclic Graphs

```{include} dag.md

```

(pytask)=

## Introduction to pytask

```{include} pytask.md
```