Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PySpark is not being included in requirements.txt file in a new kedro project #3848

Open
thorugo-code opened this issue May 3, 2024 · 1 comment
Labels
Community Issue/PR opened by the open-source community

Comments

@thorugo-code
Copy link

Description

After starting a new kedro project with all the packages selected, I went into my project folder to install the requirements and PySpark isn't being installed because it's not included in the list of packages.

Context

The lack of PySpark is preventing the application from running.

Steps to Reproduce

  1. python -m venv venv
  2. ./venv/Scripts/activate.ps1
  3. pip install kedro
  4. kedro new
  5. select all packages and answer yes to pipeline example
  6. cd app
  7. pip install -r requirements.txt
  8. kedro run

Expected Result

Open application

Actual Result


╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ in _run_module_as_main:198                                                                       │
│ in _run_code:88                                                                                  │
│                                                                                                  │
│ in <module>:7                                                                                    │
│                                                                                                  │
│ F:\Testes\kedro-test\venv\Lib\site-packages\kedro\framework\cli\cli.py:233 in main               │
│                                                                                                  │
│   230 │   cli_collection = KedroCLI(                                                             │
│   231 │   │   project_path=_find_kedro_project(Path.cwd()) or Path.cwd()                         │
│   232 │   )                                                                                      │
│ ❱ 233 │   cli_collection()                                                                       │
│   234                                                                                            │
│                                                                                                  │
│ F:\Testes\kedro-test\venv\Lib\site-packages\click\core.py:1157 in __call__                       │
│                                                                                                  │
│ F:\Testes\kedro-test\venv\Lib\site-packages\kedro\framework\cli\cli.py:130 in main               │
│                                                                                                  │
│   127 │   │   )                                                                                  │
│   128 │   │                                                                                      │
│   129 │   │   try:                                                                               │
│ ❱ 130 │   │   │   super().main(                                                                  │
│   131 │   │   │   │   args=args,                                                                 │
│   132 │   │   │   │   prog_name=prog_name,                                                       │
│   133 │   │   │   │   complete_var=complete_var,                                                 │
│                                                                                                  │
│ F:\Testes\kedro-test\venv\Lib\site-packages\click\core.py:1078 in main                           │
│                                                                                                  │
│ F:\Testes\kedro-test\venv\Lib\site-packages\click\core.py:1688 in invoke                         │
│                                                                                                  │
│ F:\Testes\kedro-test\venv\Lib\site-packages\click\core.py:1434 in invoke                         │
│                                                                                                  │
│ F:\Testes\kedro-test\venv\Lib\site-packages\click\core.py:783 in invoke                          │
│                                                                                                  │
│ F:\Testes\kedro-test\venv\Lib\site-packages\kedro\framework\cli\project.py:222 in run            │
│                                                                                                  │
│   219 │   tuple_tags = tuple(tags)                                                               │
│   220 │   tuple_node_names = tuple(node_names)                                                   │
│   221 │                                                                                          │
│ ❱ 222 │   with KedroSession.create(                                                              │
│   223 │   │   env=env, conf_source=conf_source, extra_params=params                              │
│   224 │   ) as session:                                                                          │
│   225 │   │   session.run(                                                                       │
│                                                                                                  │
│ F:\Testes\kedro-test\venv\Lib\site-packages\kedro\framework\session\session.py:151 in create     │
│                                                                                                  │
│   148 │   │   Returns:                                                                           │
│   149 │   │   │   A new ``KedroSession`` instance.                                               │
│   150 │   │   """                                                                                │
│ ❱ 151 │   │   validate_settings()                                                                │
│   152 │   │                                                                                      │
│   153 │   │   session = cls(                                                                     │
│   154 │   │   │   project_path=project_path,                                                     │
│                                                                                                  │
│ F:\Testes\kedro-test\venv\Lib\site-packages\kedro\framework\project\__init__.py:293 in           │
│ validate_settings                                                                                │
│                                                                                                  │
│   290 │   │   )                                                                                  │
│   291 │   # Check if file exists, if it does, validate it.                                       │
│   292 │   if importlib.util.find_spec(f"{PACKAGE_NAME}.settings") is not None:                   │
│ ❱ 293 │   │   importlib.import_module(f"{PACKAGE_NAME}.settings")                                │
│   294 │   else:                                                                                  │
│   295 │   │   logger = logging.getLogger(__name__)                                               │
│   296 │   │   logger.warning("No 'settings.py' found, defaults will be used.")                   │
│                                                                                                  │
│ C:\Users\vitor\AppData\Local\Programs\Python\Python311\Lib\importlib\__init__.py:126 in          │
│ import_module                                                                                    │
│                                                                                                  │
│   123 │   │   │   if character != '.':                                                           │
│   124 │   │   │   │   break                                                                      │
│   125 │   │   │   level += 1                                                                     │
│ ❱ 126 │   return _bootstrap._gcd_import(name[level:], package, level)                            │
│   127                                                                                            │
│   128                                                                                            │
│   129 _RELOADING = {}                                                                            │
│ in _gcd_import:1204                                                                              │
│ in _find_and_load:1176                                                                           │
│ in _find_and_load_unlocked:1147                                                                  │
│ in _load_unlocked:690                                                                            │
│ in exec_module:940                                                                               │
│ in _call_with_frames_removed:241                                                                 │
│                                                                                                  │
│ F:\Testes\kedro-test\api\src\api\settings.py:6 in <module>                                       │
│                                                                                                  │
│    3 https://docs.kedro.org/en/stable/kedro_project_setup/settings.html."""                      │
│    4                                                                                             │
│    5 # Instantiated project hooks.                                                               │
│ ❱  6 from api.hooks import SparkHooks  # noqa: E402                                              │
│    7                                                                                             │
│    8 # Hooks are executed in a Last-In-First-Out (LIFO) order.                                   │
│    9 HOOKS = (SparkHooks(),)                                                                     │
│                                                                                                  │
│ F:\Testes\kedro-test\api\src\api\hooks.py:2 in <module>                                          │
│                                                                                                  │
│    1 from kedro.framework.hooks import hook_impl                                                 │
│ ❱  2 from pyspark import SparkConf                                                               │
│    3 from pyspark.sql import SparkSession                                                        │
│    4                                                                                             │
│    5                                                                                             │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
ModuleNotFoundError: No module named 'pyspark'

Your Environment

  • Kedro version used (pip show kedro or kedro -V): kedro, version 0.19.5
  • Python version used (python -V): Python 3.11.7
  • Operating system and version: Windows 11 Home Single Language 23H2 22631.3527
@merelcht
Copy link
Member

Hi @thorugo-code, thanks for opening this issue. I'm sorry you're facing problems getting started with Kedro. It is actually not expected that pyspark is added to the requirements.txt. Instead, we'd expect:

kedro-datasets[spark-sparkdataset]>=3.0; python_version >= "3.9"
kedro-datasets[spark.SparkDataset]>=1.0; python_version < "3.9"

to be added. Our SparkDataset had a dependency on pyspark, so this becomes a dependency in that way. I've replicated the steps and on my side pyspark is successfully installed without needing to make any alterations. Could you share your resulting requirements.txt file?

@merelcht merelcht added the Community Issue/PR opened by the open-source community label May 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Community Issue/PR opened by the open-source community
Projects
Status: No status
Development

No branches or pull requests

2 participants