Skip to content

Latest commit

 

History

History
669 lines (569 loc) · 33.3 KB

README.md

File metadata and controls

669 lines (569 loc) · 33.3 KB

Best-of Generator

🏆  Generates a ranked markdown list of awesome libraries and tools.

Getting StartedDocumentationSupportReport a BugContributionChangelog

The best-of-generator is a CLI tool to generate a markdown page of ranked open-source projects based on a list of projects defined in a yaml file. It is integrated with different package managers - such as PyPI, NPM, Conda, and Docker Hub - to automatically collect a variety of project metadata and calculate project-quality scores. It also comes with a GitHub Action workflow for a fully automized update process.

🧙‍♂️ Create your own best-of list in just 3 minutes with this guide.

Highlights

  • 📇  Generates a beautiful markdown page from a yaml list.
  • 🔌  Integrates various package managers (npm, pypi, conda ...).
  • 🥇  Calculates a project-quality score based on a variety of metrics.
  • 📈  Identifies trending projects based on collected metrics.
  • 🔄  GitHub Action workflow for automated weekly updates.

Getting Started

🧙‍♂️ If you want to create your own best-of list, we strongly recommend to follow this guide instead of setting up best-of manually. With the guide, it will only take about 3 minutes to get you started. It is already set-up to automatically run the best-of generator via our GitHub Action and includes other useful template files. Installing the best-of CLI tool is not required.

  1. Install best-of generator via pip:
    pip install best-of
  2. Create a projects.yaml file based on the documented structure. This file should contain at least one project. For example:
    projects:
       - name: "best-of-ml-python"
         github_id: "ml-tooling/best-of-ml-python"
  3. Run best-of generator via command-line:
    best-of generate -g <GITHUB_API_TOKEN> ./projects.yaml

You can find further information on how to configure the projects.yaml file and additional features in the documentation section below.

Support & Feedback

This project is maintained by Benjamin Räthlein, Lukas Masuch, Jan Kalkan, and Johannes Rieke. Please understand that we won't be able to provide individual support via email. We also believe that help is much more valuable if it's shared publicly so that more people can benefit from it.

Type Channel
🚨  Bug Reports
🎁  Feature Requests
👩‍💻  Usage Questions
📢  Announcements
❓  Other Requests

Documentation

YAML StructureProjectsCategoriesLabelsConfigurationProject Quality ScoreTrending ProjectsCLIGitHub ActionPython API

The best-of generator is a CLI tool to generate a markdown page from a list of projects configured in a yaml file. The documentation sections below will provide information on the projects.yaml structure, on its different sections (projects, labels, categories & configuration), on some of the best-of features (e.g. project-quality score & trending projects), and instructions on how to run the markdown generation via the command-line interface or via GitHub Actions.

projects.yaml Structure

The projects.yaml file has the following structure:

  • configurations (optional): Can be used to overwrite the default configuration of the best-of list. More information in the configuration section.
  • categories (required): All used categories should be listed here with at least a descriptive title. More information in the categories section.
  • labels (optional): Used labels can be added here to extend the label with additional aspects (e.g. URL, image, description). More information in the labels section.
  • projects (required): All projects that are supposed to be shown in the generated markdown page should be listed here. More information in the projects section.

The following yaml shows a small example:

# Optional: change the default configuration
configuration:
    markdown_header_file: "config/header.md"
    markdown_footer_file: "config/footer.md"

# Optional: add categories
categories:
  - category: "data-engineering"
    title: "Machine Learning & Data Engineering"
    subtitle: "Best-of lists about machine learning, data engineering, data science, or other topics related to big data."

# Optional: add labels
labels:
  - label: "python"
    image: "https://www.python.org/static/favicon.ico"
    description: "Best-of list with Python projects"

# Required: list of all projects
projects:
  - name: "best-of-ml-python"
    github_id: "ml-tooling/best-of-ml-python"
    labels: ["python"]
    category: "data-engineering"

Projects

A project is the main component of a best-of list. In most cases, a project is hosted on GitHub and released on different package managers. Such a project should be added with the github_id and the IDs of all the package managers it is released to. However, it is also possible to add projects which are not hosted on GitHub or released on a package manager, as shown in the example below.

Project Examples

projects:
  # Projects with different package managers:
  - name: "Tensorflow"
    github_id: "tensorflow/tensorflow"
    pypi_id: "tensorflow"
    conda_id: "conda-forge/tensorflow"
    dockerhub_id: "tensorflow/tensorflow"
  - name: "Best-of Generator"
    pypi_id: "best-of"
    github_id: "best-of-lists/best-of-generator"
  # Link to another project collection:
  - name: "Best-of Overview"
    homepage: "https://best-of.org"
    resource: True
  # Project that is not on GitHub:
  - name: "Quart"
    pypi_id: "quart"
    homepage: "https://gitlab.com/pgjones/quart"
    description: "Quart is a Python ASGI web microframework with the same API as Flask."
    license: "MIT"
    star_count: 772
    show: True

The example above will be rendered as shown below:

Projects Example

Every project can also be expanded to show additional project information (by clicking on the project), for example:

Project Body Example

Project Properties

Property Description
name Name of the project. This name is required to be unique on the best-of list.
Optional Properties:
github_id GitHub ID of the project based on user or organization and the repository name, e.g. best-of-lists/best-of-generator. If the project is hosted on GitLab, please use the gitlab_id property.
category Category that this project is most related to. You can find all available category IDs in the projects.yaml file. The project will be sorted into the Others category if no category is provided.
labels List of labels that this project is related to. You can find all available label IDs in the projects.yaml file.
license License of the project. If set, license information from GitHub or package managers will be overwritten. Can be a custom URL pointing to more information in case it is not a standard license. `allowed_licenses` must be set to "all" or contain the URL in order to show the project.
description Short description of the project. If set, the description from GitHub or package managers will be overwritten.
homepage Homepage URL of the project. Only use this property if the project homepage is different from the GitHub URL.
docs_url Documentation URL of the project. Only use this property if the project documentation site is different from the GitHub URL.
resource If True, the project will be marked as a resource. Resources are not ranked and will always be shown on top of the category. You can use this to link to another best-of list section or website that contains additional projects.
group If True, the project will be used as top project for grouping a set of related projects. group_id also needs to be set to the shared group ID.
group_id Group ID that can be used to group this project to other projects. For every group, there needs to be one project with group set to True.
show If True, the project will always be shown even when the project would be actual hidden (e.g. dead project, risky licenses, to few stars...). Only use this property if you are sure that this project needs to be shown.
ignore If True, the project will be ignored. This also means that it will not be included in the hidden projects section. However, the project metadata will still be collected.
Supported Integrations:
pypi_id Project ID on the Python package index (PyPi).
conda_id Project ID on the conda package manager. If the main package is provided on a different channel, prefix the ID with the given channel: e.g. conda-forge/tensorflow
npm_id Project ID on the Node package manager (npm).
dockerhub_id Project ID on the Docker Hub container registry.
maven_id Artifact ID on Maven central, e.g. org.apache.flink:flink-core.
github_id GitHub ID of the project based on user or organization and the repository name, e.g. best-of-lists/best-of-generator.
gitlab_id GitLab ID of the project based on user or organization and the repository name, e.g. best-of-lists/best-of-generator.

While you can theoretically overwrite all project metadata, we suggest to only set the properties which the best-of generator is not able to find on GitHub or the configured package managers. There are also other undocumented properties, but for most projects those properties should not be overwritten.

Additional undocumented project metadata (click to expand...)
  • created_at
  • update_at
  • github_url
  • github_release_downloads
  • github_dependent_project_count
  • last_commit_pushed_at
  • star_count
  • commit_count
  • dependent_project_count
  • contributor_count
  • fork_count
  • monthly_downloads
  • open_issue_count
  • closed_issue_count
  • release_count
  • latest_stable_release_published_at
  • latest_stable_release_number
  • trending
  • helm_id
  • brew_id
  • apt_id
  • yum_id
  • snap_id
  • maven_id
  • dnf_id
  • yay_id
  • <PACKAGE_MANAGER>_url
  • <PACKAGE_MANAGER>_latest_release_published_at
  • <PACKAGE_MANAGER>_dependent_project_count

Categories

A category allows to add additional structure to the best-of list by grouping related projects into a shared category. Thereby, every project is grouped into exactly one category. If no category is provided with the project metadata, the project will be categorized into Others.

Category Example

categories:
  - category: "data-engineering"
    title: "Machine Learning & Data Engineering"
    subtitle: "Best-of lists about machine learning, data engineering, data science, or other topics related to big data."

projects:
  - name: "best-of-ml-python"
    github_id: "ml-tooling/best-of-ml-python"
    category: "data-engineering"

The example above will be rendered as shown below:

Category Example

Category Properties

Property Description
category ID of the category. This ID should also be used for adding a project to this category.
title Category name used as the header of the category section.
Optional Properties:
subtitle Short description about the category shown under the title.
ignore If True, the category and all its projects will be ignored.

Labels

A label allows to highlight similarities or special features shared between projects. Compared to categories, a project can have any number of labels. The labels are shown as badges attached to the project description. It can have only an image (favicons are recommended), only a name, or both. We recommend to use image labels (or only very short labels) since the usage of labels will shorten the visible description text of a project.

Label Example

labels:
  - label: "python"
    image: "https://www.python.org/static/favicon.ico"
    description: "Best-of list with Python projects"
  - label: "libraries"
    name: "libraries"

projects:
  - name: "best-of-ml-python"
    github_id: "ml-tooling/best-of-ml-python"
    labels: ["libraries", "python"]
    category: "data-engineering"

The example above will be rendered as shown below:

Label Example

Label Properties

Property Description
label ID of the label. This ID should also be used for adding the label to a project.
Optional Properties:
image URL to an image. If a valid URL is provided, the image will be shown wherever the label is used.
name Name of the label. If a name is provided, the name will be shown wherever the label is used.
description Short description of the label. If show_labels_in_legend configuration is True and an image is set, this description will also be shown in the legend (explanations).
ignore If True, the label will not be shown anywhere.
url If url is set, the label will be a rendered as a link wherever it is used.

Configuration

Many aspects of the best-of list can be configured. Since most default values are selected to support the widest range of different lists, changing the default configuration is not required for most cases.

Configuration Example

configuration:
  min_stars: 0
  min_projectrank: 0
  allowed_licenses: ["all"]
  markdown_header_file: "config/header.md"
  markdown_footer_file: "config/footer.md"

The configuration example above changes the default configuration to show all projects regardless of star count (via min_stars), projectrank (via min_projectrank), or license (via allows_licenses). It also configures a header (via markdown_header_file) and footer (via markdown_footer_file) markdown files that will be attached to the generated content.

Configuration Options

Config Description Default
output_file The markdown output file. ./README.md
markdown_header_file Path to a markdown file that will be attached above the generated content.
markdown_footer_file Path to a markdown file that will be attached below the generated content.
output_generator Select the markdown generator to use for generating the output markdown page. Currently, only markdown-list is supported. markdown-list
project_inactive_months Number of months without activity until a project is marked as inactive. 6
project_dead_months Number of months without activity until a project is marked as dead. 12
project_new_months Number of months since creation to mark a project as newcomer. 6
min_projectrank Project will be hidden if it has a smaller projectrank (quality score). 10
min_stars Project will be hidden if it has a less stars on GitHub. 100
require_license If True, all projects without a detected license will be hidden. True
require_repo If True, all projects without a source repository - configure via github_id or gitlab_id - will be hidden. False
min_description_length The minimum length of the project description. If the length is less, the project will not be shown. 10
max_description_length The maximum length of the project description. 55
ascii_description If True, all non-ASCII characters in the project description will be removed. Useful for filtering out distractive emoji, but hurtful in non-English cases. (Note: GitHub emoji commands (e.g. :smile:) are always removed.) True
projects_history_folder The folder used for storing history files (csv files with project metadata). If null, no history files will be created. ./history
generate_install_hints If False, the install hint code block for the package managers will not be shown. True
generate_toc If True, generate a table of content with all categories. True
category_heading How categories headings are generated. If simple, headings will be ## Category, and IDs are set by GitHub. If robust, headings will be <h2 id='category-id'>Category</h2>. (TOC relies on these IDs.) If all of your categories' names are ASCII, use simple. simple
generate_legend If True, generate a legend containing explanations for the used emojis. True
sort_by The project property used to sort the projects within a category. projectrank
max_trending_projects The number of trending projects to show for trending up as well as down. 5
hide_empty_categories If True, empty categories will not be shown. False
hide_project_license If True, the project license badge will not be shown. False
hide_license_risk If True, the risk indicator for uncommon or risky licenses will not be shown. False
show_labels_in_legend If True, image labels will be listed in the legend (explanation) if they also have a description. True
allowed_licenses List of allowed licenses (spdx format). A project with a different license will be hidden. Use ["all"] to allow all licenses. selection of common open-source licenses
extension_script Path to a python script which is loaded before project collection or markdown generation to allow extensibility.

Project Quality Score

All projects in a best-of list are ranked and sorted by a project-quality score (also called projectrank). The score is calculated based on various metrics automatically collected from GitHub and different package managers. The score is just a sum of points which a project collects for various aspects and metrics. The score only has a meaning when it is compared to the project-quality score of other projects. We currently use the following aspects to calculate the score:

This calculation is just chosen by experience. There is no scientific proof that this really reflects the quality of a project.

  • Has homepage link & description: + 1
  • Has an existing GitHub repository: + 1
  • Has a license: + 1
  • Has a commonly used license (e.g. MIT): + 1
  • Has multiple releases: + 1
  • Has stable releases based on semantic version: + 1
  • Has a release that is less than 6 months old: + 1
  • Repo was update in the last 3 months: + 1
  • Is older than 6 months: + 1
  • Metrics from GitHub & package mangers:
    • Number of stars: + log(COUNT / 2)
    • Number of contributors: + log(COUNT / 2) - 1
    • Number of commits: + log(COUNT / 2) - 1
    • Number of forks: + log(COUNT / 2)
    • Number of monthly downloads: + log(COUNT / 2) - 1
    • Number of dependent projects: + log(COUNT / 1.5)
    • Number of watchers: + log(COUNT / 2) - 1
    • Number of closed issues: + log(COUNT / 2) - 1

Trending Projects

The best-of list is able to automatically identify trending projects by comparing project-quality scores between the metadata of the current generation with the latest history file. If the history is activated (projects_history_folder is not set to null), the best-of generation will automatically create a <YYYY-MM-dd>_changes.md file in the configured history folder for every update and a latest-changes.md file in the folder of the generated markdown page. These files contain a list of projects that are trending up (higher quality score since last update) and down (lower quality score since last update) as well as a list of all added projects since the last update, as shown in the following example:

Trending project example

The GitHub Action workflow uses these markdown files to automatically create releases for every update. This allows to persist a useful changelog over many updates and enables readers to get valuable email updates whenever the list is updated (by watching for release events).

Generation via CLI

To use the CLI, you need to have the best-of generator installed via pip: pip install best-of

best-of generate [OPTIONS] PATH

Generates a best-of markdown page from a yaml file.

Arguments:

  • PATH: Path to the yaml file containing the best-of metadata (e.g. ./projects.yaml).

Options:

Generation via GitHub Action

🧙‍♂️ If you want to create your own best-of list, we strongly recommend to follow this guide. With the guide, it will only take about 3 minutes to get you started. It already includes this GitHub Action and some other useful template files. Further manual steps for setting up the GitHub Action are not required.

The best-of-update-action makes it very easy to set-up automated scheduled updates for your best-of markdown page. Please refer to the best-of-update-action documentation for more detailed information about the GitHub Action and the workflow.

Generation via Python API

Usage of the Python API is not well documented yet and currently not recommended.

The best-of generator can also be used and integrated via its Python API. The full Python API documentation can be found here.

Updating Best-of Generator

Known Issues

The generated README file is not displayed completely (click to expand...)

GitHub only renders the first 512 kb of the main README.md file and will cut off the rendered version as soon as it has processed the first 512 kb of the raw markdown content. The rendering is only cut off when viewing the readme on the main repo page. If you directly select the README.md file, it will render in its entirety. To mitigate this issue, we optimized the markdown generation to require the minimum amount of characters. However, if you have a very large list of projects (more than 800), you might reach the 512 kb limit (check the file size of the generated README.md file). In this case, we suggest to extract some of the categories or projects into smaller best-of lists.

Contribution

Development

Requirements: Docker and Act are required to be installed on your machine to execute the containerized build process.

To simplify the process of building this project from scratch, we provide build-scripts - based on universal-build - that run all necessary steps (build, check, test, and release) within a containerized environment. To build and test your changes, execute the following command in the project root folder:

act -b -j build

Refer to our contribution guides for more detailed information on our build scripts and development process.


Licensed MIT. Created and maintained with ❤️  by developers from Berlin.