A test framework, that defines several test doubles, to facilitate Python Spark application development.
Defines the following test doubles:
FakeSparkSession
- Stubs
sql(sql_query)
method to only log thesql_queries
, not sending them to database for execution; table(table_mame)
andcreateDataFrame(data[, schema, samplingRatio, verifySchema])
methods delegate execution to the realSparkSession
, but returns aFakeDataFrame
instead of aDataFrame
;table(table_name)
is often overridden in a subclass to return a table from a fake test database.
- Stubs
FakeDataFrame
write
returns aFakeDFWriter
;- Other methods work just like a real
DataFrame
, but returnFakeDataFrame
s instead ofDataFrame
s;
FakeDFWriter
- Stubs a
DataFrameWriter
to only logRow
s written, not writing them at all.
- Stubs a
Defines FakeDeltaTable
, that stubs merge(source, condition)
to only log the merge operation, changing no data.
Defines the following test doubles:
FakeDatetime
- Stubs
now()
method to return always a predefineddatetime
.
- Stubs
FakeDate
- Stubs
today()
method to return always a predefineddate
.
- Stubs