Skip to content

Google Summer of Code

Terts Diepraam edited this page Feb 6, 2023 · 1 revision

What is Google Summer of Code?

As Google explains it:

Google Summer of Code is a global, online program focused on bringing new contributors into open source software development. GSoC Contributors work with an open source organization on a 12+ week programming project under the guidance of mentors.

If you want to know more about how it works, check out the links below.

Useful links:

How to get started

Here are some steps to follow if you want to apply for a GSOC project with uutils.

  1. Check the requirements. You have to meet Google's requirements to apply. Specifically for uutils, it's best if you at least know some Rust and have some familiarity with using the coreutils.
  2. Reach out to us! We are happy to discuss potential projects and help you find a meaningful project for uutils. Tell us what interests you about the project and what experience you have and we can find a suitable project together. You can talk to the uutils maintainers on the Discord server. In particular, you can contact:
    • Sylvestre Ledru (@sylvestre on GitHub and Discord)
    • Terts Diepraam (@tertsdiepraam on GitHub and @terts on Discord)
  3. Get comfortable with uutils. To find a good project you need to understand the codebase. We recommend that you take a look at the code, the issue tracker and maybe try to tackle some good-first-issues. Also take a look at our contributor guidelines.
  4. Find a project and a mentor. We have a list of potential projects you can adapt or use as inspiration. Make sure discuss your ideas with the maintainers! Some project ideas below have suggested mentors you could contact.
  5. Write the application. You can do this with your mentor. The application has to go through Google, so make sure to follow all the advice in Google's Contributor Guide.

Tips

  • Make sure the project is concrete and well-defined.
  • Communication is super important!
  • Try to tackle some simple issues to get familiar with uutils.

Project Ideas

This page contains project ideas for the Google Summer of Code for uutils. Feel free to suggest project ideas of your own.

Guidelines for the project list

Summarizing that page, each project should include:

  • Title
  • Description
  • Expected outputs
  • Skills required/preferred
  • Possible mentors
  • Size (either ~175 or ~350 hours)
  • Difficulty (easy, medium or hard)

Implement stty

The stty utility is currently only partially implemented and should be expanded.

See issues: #3859, #3860, #3861, #3862, #3863.

  • Difficulty: Medium
  • Size: 175 or 350 depending on the scope
  • Mentors: Terts Diepraam
  • Required skills:
    • Rust
    • Basic knowledge about the terminal

Localization

Support for localization for formatting, quoting & sorting in various utilities, like date, ls and sort. For this project, we need to figure out how to deal with locale data. The first option is to use the all-Rust icu4x library, which has a different format than what distributions usually provide. In this case a solution could be to write a custom localedef-like command. The second option is to use a wrapper around the C icu library, which comes with the downside of being a C dependency.

This is described in detail in issue #3997.

And was also discussed in #1919, #3584.

  • Difficulty: Hard
  • Size: TBD
  • Mentors: TBD
  • Required skills:
    • Rust

Better GNU test reports

Better integration with the GNU tests, because they usually test many cases in one sh file and I would like to have more detailed feedback on how many tests inside a file are passing.

  • Difficulty: TBD
  • Size: TBD
  • Mentors: TBD
  • Required skills:
    • Rust
    • Bash
    • (preferably) CI/CD

A multicall binary and core library for findutils

findutils currently exists of a few unconnected binaries. It would be nice to have a multicall binary (like coreutils) and a library of shared functions (like uucore).

This also might require thinking about sharing code between coreutils and findutils.

  • Difficulty: Medium
  • Size: 175 hours
  • Mentors: TBD
  • Required skills:
    • Rust

Refactoring factor

The uutils factor is currently significantly slower than GNU factor and only supports numbers up to 2^64-1. See issue 1559 and issue 1456 for more information.

  • Difficulty: Hard
  • Size: 175 hours
  • Mentors: TBD
  • Required skills:
    • Rust
    • Optimization techniques
    • (preferably) mathematics

Symbolic/Fuzz Testing and Formal Verification of Tool Grammars

See Using Lightweight Formal Methods to Validate a Key Value Storage Node In Amazon S3.

Most KLEE scaffolding was done for KLEE 2021.

Start with wc, formalize the command line grammar. Get it working under AFL++ and Klee. Add several proofs of resource use and correctness - especially proofs about operating system calls and memory/cache usage. Generalize to other tools. Try to unify the seeds for the fuzzer and KLEE so they can help each other find new paths. Use QEMU to test several operating systems and architectures. Automate detection of performance regressions - try to hunt for accidentally quadratic behavior.

Specific to wc - formalize the inner loop over a UTF-8 buffer into a finite state automata with counters that can generalize into SIMD width operations like simdjson. Further generalize into a monoid so K processors can combine results.

  • Difficulty: Mixed
  • Size: Mixed
  • Mentors: TBD - informally @chadbrewbaker
  • Required skills:
    • Rust
    • KLEE
    • Fuzzers like AFL++
    • Grammar testing frameworks like LARK
    • /usr/bin/time -v (and similar tools for Widows/OSX).
    • Alloy, TLA+, P
    • System call tracing with strace, uftrace etc.
    • SMT solvers like Z3 and CVC5 for superoptimization and proofs of automata equivalence.
    • SOUPER and CompilerExplorer
    • Basic statistics on quantiles (histograms) for outlier detection. The math is simple as generalizing from one to k medians but the formal notation is complex.
    • MPI-IO, just enough to read a file into k parts and combine "wc" outputs to understand multicore scaling.

Official Redox support

We want to support the Redox operating system, but are not actively testing against it. Since the last round of fixes in #2550, many changes have probably been introduced that break Redox support. This project would involve setting up Redox in the CI and fixing any issues that arise and porting features over.

  • Difficulty: Medium
  • Size: 175 hours
  • Mentors: TBD
  • Required skills:
    • Rust

Port GNU's parse_datetime

GNU coreutils has a particularly complex function called parse_datetime, which parses absolute and relative date and time according to the rules specified in the documentation. We currently only support a small subset of the formats that GNU's parse_datetime supports. This function is used for the -d option of touch and as input to date.

At the end of the project, there should be a module (or crate) with a fully compatible datetime parser with an extensive test suite.

See PR 4193 and parse_date in touch

  • Difficulty: Hard
  • Size: ~350 hours
  • Mentors: TBD
  • Required skills:
    • Rust
    • Parsing