English (US) | Português (BR)

Querido Diário

Within the Querido Diário ecosystem, this repository is responsible for scraping official gazettes publishing sites

Find out more about technologies and history of the project on the Querido Diário website

Summary

How to contribute
Development Environment
How to run
Troubleshooting
Support
Thanks
Open Knowledge Brazil
License

How to contribute

Thank you for considering contributing to Querido Diário! 🎉

You can find how to do it at CONTRIBUTING-en-US.md!

Also, check the Querido Diário documentation to help you.

Development Environment

You need to have Python (+3.0) and Scrapy framework installed.

The commands below set it up in Linux operating system. They consist of creating a virtual Python environment, installing the requirements listed in requirements-dev and the code standardization tool pre-commit.

python3 -m venv .venv
source .venv/bin/activate
pip install -r data_collection/requirements-dev.txt
pre-commit install

Configuration on other operating systems is available at "how to setup the development environment", including more details for those who want to contribute to the repository.

How to run

To try running a scraper already integrated into the project or to test what you are developing, follow the commands:

If you haven't already done so, activate the virtual environment in the /querido-diario directory:

source .venv/bin/activate

Go to the data_collection directory:

cd data_collection

Check the available scrapers list:

scrapy list

Run a listed scraper:

scrapy crawl <scraper_name> //example: scrapy crawl ba_acajutiba

The official gazettes collected from scraping will be saved in the data_collection/data folder
When executing item 4, the scraper will collect all official gazettes from the publishing site of that municipality since the first digital edition. For smaller runs, use flags in the run command:

start_date=YYYY-MM-DD: will set the collecting start date.

scrapy crawl <scraper_name> -a start_date=<YYYY-MM-DD>

end_date=YYYY-MM-DD: will set the collecting end date. If omitted, it will assume the date of the day it is being executed.

scrapy crawl <scraper_name> -a end_date=<YYYY-MM-DD>

Troubleshooting

Check out the troubleshooting file to resolve the most common issues with project environment setup.

Support

Join our community server for exchanges about projects, questions, requests for help with contributions and talk about civic innovation in general.

Thanks

This project is maintained by Open Knowledge Brazil and made possible thanks to the technical communities, the Ambassadors of Civic Innovation, volunteers and financial donors, in addition to partner universities, companies supporters and funders.

Meet who supports Querido Diario.

Open Knowledge Brazil

Open Knowledge Brazil is a non-profit civil society organization whose mission is to use and develop civic tools, projects, public policy analysis and data journalism to promote free knowledge in the various fields of society.

All work produced by OKBR is openly and freely available.

License

Code licensed under the MIT License.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README-en-US.md

README-en-US.md

Querido Diário

Summary

How to contribute

Development Environment

How to run

Troubleshooting

Support

Thanks

Open Knowledge Brazil

License

Files

README-en-US.md

Latest commit

History

README-en-US.md

File metadata and controls

Querido Diário

Summary

How to contribute

Development Environment

How to run

Troubleshooting

Support

Thanks

Open Knowledge Brazil

License