BIG - Project template (your repo name here)

How to run the project

Change rootProject.name in settings.gradle accordingly to the project name
The project must build with Gradle (i.e., ./gradlew produces a successful build)
Once completed, ./gradlew creates build/libs/*.jar files. To execute Spark code
- If no external library is needed, spark-submit the jar (i.e., project-name.jar)
- If external libraries are needed, spark-submit the fat jar (i.e., project-name-all.jar)

Project structure

datasets/   -- where datasets are stored (heavy datasets cannot be committed)
outputs/    -- where generated datasets are stored (should not be committed)
results/    -- where experiment/thesis results are stored (must be committed)
src/        -- source code

Working on this project

Import this project as Gradle project (this is tested with IntelliJ IDEA).

Guidelines

Code

Add as many useful comments as possible
Delete all useless code / resources
Test early, Test often, Test everything you can
Write a proper README.md (i.e., override this one) that explains:
- the project structure
- the algorithmic parameters
- how to run the project
Check the output of ./gradlew to look for warnings (especially in code style)

Dataset conventions

All datasets must be named as follows: ProjectName-par1_val1-...-parN_valN.csv
The only exception is for hive tables: ProjectName__par1_val1__...__parN_valN.csv
- All Spark applications must read/write from/to .csv files as well as Hive tables
Schemas for trajectory databases: (userid, trajectoryid, latitude, longitude, timestamp) where timestamp is unix timestamp (i.e., seconds since 01/01/1970)
- In src/main/python/sample.py you can find an example to transform an uncompliant dataset schema

Dependency management

All Java/Scala dependencies must be managed through Gradle (build.gradle). See here.

Software projects rarely work in isolation. In most cases, a project relies on reusable functionality in the form of libraries or is broken up into individual components to compose a modularized system. Dependency management is a technique for declaring, resolving and using dependencies required by the project in an automated fashion. Gradle has built-in support for dependency management and lives up to the task of fulfilling typical scenarios encountered in modern software projects.

All Python dependencies must be managed through virtual environments. See here.

The venv module provides support for creating lightweight “virtual environments” with their own site directories, optionally isolated from system site directories. Each virtual environment has its own Python binary (which matches the version of the binary that was used to create this environment) and can have its own independent set of installed Python packages in its site directories.

cd src/main/python
python -m venv venv
pip install -r requirements.txt

To activate venv in Windows (with bash shell; e.g., git bash)

source venv/Scripts/activate

To activate venv in Linux

source venv/bin/activate

Name		Name	Last commit message	Last commit date
Latest commit History 391 Commits
.github/workflows		.github/workflows
config		config
datasets		datasets
gradle/wrapper		gradle/wrapper
outputs		outputs
results		results
src		src
.gitattributes		.gitattributes
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
README.md		README.md
build.gradle		build.gradle
gradlew		gradlew
gradlew.bat		gradlew.bat
package-lock.json		package-lock.json
package.json		package.json
release.config.js		release.config.js
renovate.json		renovate.json
settings.gradle		settings.gradle

big-unibo/project-template

Folders and files

Latest commit

History

Repository files navigation

BIG - Project template (your repo name here)

How to run the project

Project structure

Working on this project

Guidelines

Code

Dataset conventions

Dependency management

About

Resources

Stars

Watchers

Forks

Languages