Skip to content

Commit

Permalink
Merge pull request scikit-learn#1 from scikit-learn/master
Browse files Browse the repository at this point in the history
Merging changes from the main repository
  • Loading branch information
arka204 committed Apr 11, 2020
2 parents 4346c82 + 0a93fc9 commit 3b79637
Show file tree
Hide file tree
Showing 119 changed files with 2,298 additions and 1,029 deletions.
2 changes: 1 addition & 1 deletion Makefile
Expand Up @@ -67,4 +67,4 @@ code-analysis:
pylint -E -i y sklearn/ -d E1103,E0611,E1101

flake8-diff:
./build_tools/circle/linting.sh
git diff upstream/master -u -- "*.py" | flake8 --diff
26 changes: 13 additions & 13 deletions README.rst
Expand Up @@ -31,12 +31,12 @@ SciPy and is distributed under the 3-Clause BSD license.

The project was started in 2007 by David Cournapeau as a Google Summer
of Code project, and since then many volunteers have contributed. See
the `About us <http://scikit-learn.org/dev/about.html#authors>`__ page
the `About us <https://scikit-learn.org/dev/about.html#authors>`__ page
for a list of core contributors.

It is currently maintained by a team of volunteers.

Website: http://scikit-learn.org
Website: https://scikit-learn.org


Installation
Expand Down Expand Up @@ -73,21 +73,21 @@ or ``conda``::

conda install scikit-learn

The documentation includes more detailed `installation instructions <http://scikit-learn.org/stable/install.html>`_.
The documentation includes more detailed `installation instructions <https://scikit-learn.org/stable/install.html>`_.


Changelog
---------

See the `changelog <http://scikit-learn.org/dev/whats_new.html>`__
See the `changelog <https://scikit-learn.org/dev/whats_new.html>`__
for a history of notable changes to scikit-learn.

Development
-----------

We welcome new contributors of all experience levels. The scikit-learn
community goals are to be helpful, welcoming, and effective. The
`Development Guide <http://scikit-learn.org/stable/developers/index.html>`_
`Development Guide <https://scikit-learn.org/stable/developers/index.html>`_
has detailed information about contributing code, documentation, tests, and
more. We've included some basic information in this README.

Expand Down Expand Up @@ -120,7 +120,7 @@ source directory (you will need to have ``pytest`` >= 3.3.0 installed)::

pytest sklearn

See the web page http://scikit-learn.org/dev/developers/advanced_installation.html#testing
See the web page https://scikit-learn.org/dev/developers/advanced_installation.html#testing
for more information.

Random number generation can be controlled during testing by setting
Expand All @@ -131,15 +131,15 @@ Submitting a Pull Request

Before opening a Pull Request, have a look at the
full Contributing page to make sure your code complies
with our guidelines: http://scikit-learn.org/stable/developers/index.html
with our guidelines: https://scikit-learn.org/stable/developers/index.html


Project History
---------------

The project was started in 2007 by David Cournapeau as a Google Summer
of Code project, and since then many volunteers have contributed. See
the `About us <http://scikit-learn.org/dev/about.html#authors>`__ page
the `About us <https://scikit-learn.org/dev/about.html#authors>`__ page
for a list of core contributors.

The project is currently maintained by a team of volunteers.
Expand All @@ -153,19 +153,19 @@ Help and Support
Documentation
~~~~~~~~~~~~~

- HTML documentation (stable release): http://scikit-learn.org
- HTML documentation (development version): http://scikit-learn.org/dev/
- FAQ: http://scikit-learn.org/stable/faq.html
- HTML documentation (stable release): https://scikit-learn.org
- HTML documentation (development version): https://scikit-learn.org/dev/
- FAQ: https://scikit-learn.org/stable/faq.html

Communication
~~~~~~~~~~~~~

- Mailing list: https://mail.python.org/mailman/listinfo/scikit-learn
- IRC channel: ``#scikit-learn`` at ``webchat.freenode.net``
- Stack Overflow: https://stackoverflow.com/questions/tagged/scikit-learn
- Website: http://scikit-learn.org
- Website: https://scikit-learn.org

Citation
~~~~~~~~

If you use scikit-learn in a scientific publication, we would appreciate citations: http://scikit-learn.org/stable/about.html#citing-scikit-learn
If you use scikit-learn in a scientific publication, we would appreciate citations: https://scikit-learn.org/stable/about.html#citing-scikit-learn
17 changes: 15 additions & 2 deletions azure-pipelines.yml
Expand Up @@ -17,18 +17,31 @@ jobs:
displayName: Add conda to PATH
- bash: sudo chown -R $USER $CONDA
displayName: Take ownership of conda installation
- bash: conda create --name flake8_env --yes flake8
- bash: |
conda create --name flake8_env --yes python=3.8
conda activate flake8_env
pip install flake8 mypy==0.770
displayName: Install flake8
- bash: |
if [[ $BUILD_SOURCEVERSIONMESSAGE =~ \[lint\ skip\] ]]; then
# skip linting
echo "Skipping linting"
exit 0
else
source activate flake8_env
conda activate flake8_env
./build_tools/circle/linting.sh
fi
displayName: Run linting
- bash: |
if [[ $BUILD_SOURCEVERSIONMESSAGE =~ \[lint\ skip\] ]]; then
# skip linting
echo "Skipping linting"
exit 0
else
conda activate flake8_env
mypy sklearn/ --ignore-missing-imports
fi
displayName: Run mypy
- bash: |
if [[ $BUILD_SOURCEVERSIONMESSAGE =~ \[scipy-dev\] ]] || \
[[ $BUILD_REASON == "Schedule" ]]; then
Expand Down
77 changes: 36 additions & 41 deletions benchmarks/bench_hist_gradient_boosting_higgsboson.py
Expand Up @@ -25,12 +25,14 @@
parser.add_argument('--learning-rate', type=float, default=1.)
parser.add_argument('--subsample', type=int, default=None)
parser.add_argument('--max-bins', type=int, default=255)
parser.add_argument('--no-predict', action="store_true", default=False)
parser.add_argument('--cache-loc', type=str, default='/tmp')
args = parser.parse_args()

HERE = os.path.dirname(__file__)
URL = ("https://archive.ics.uci.edu/ml/machine-learning-databases/00280/"
"HIGGS.csv.gz")
m = Memory(location='/tmp', mmap_mode='r')
m = Memory(location=args.cache_loc, mmap_mode='r')

n_leaf_nodes = args.n_leaf_nodes
n_trees = args.n_trees
Expand All @@ -56,6 +58,27 @@ def load_data():
return df


def fit(est, data_train, target_train, libname):
print(f"Fitting a {libname} model...")
tic = time()
est.fit(data_train, target_train)
toc = time()
print(f"fitted in {toc - tic:.3f}s")


def predict(est, data_test, target_test):
if args.no_predict:
return
tic = time()
predicted_test = est.predict(data_test)
predicted_proba_test = est.predict_proba(data_test)
toc = time()
roc_auc = roc_auc_score(target_test, predicted_proba_test[:, 1])
acc = accuracy_score(target_test, predicted_test)
print(f"predicted in {toc - tic:.3f}s, "
f"ROC AUC: {roc_auc:.4f}, ACC: {acc :.4f}")


df = load_data()
target = df.values[:, 0]
data = np.ascontiguousarray(df.values[:, 1:])
Expand All @@ -68,56 +91,28 @@ def load_data():
n_samples, n_features = data_train.shape
print(f"Training set with {n_samples} records with {n_features} features.")

print("Fitting a sklearn model...")
tic = time()
est = HistGradientBoostingClassifier(loss='binary_crossentropy',
learning_rate=lr,
max_iter=n_trees,
max_bins=max_bins,
max_leaf_nodes=n_leaf_nodes,
n_iter_no_change=None,
early_stopping=False,
random_state=0,
verbose=1)
est.fit(data_train, target_train)
toc = time()
predicted_test = est.predict(data_test)
predicted_proba_test = est.predict_proba(data_test)
roc_auc = roc_auc_score(target_test, predicted_proba_test[:, 1])
acc = accuracy_score(target_test, predicted_test)
print(f"done in {toc - tic:.3f}s, ROC AUC: {roc_auc:.4f}, ACC: {acc :.4f}")
fit(est, data_train, target_train, 'sklearn')
predict(est, data_test, target_test)

if args.lightgbm:
print("Fitting a LightGBM model...")
tic = time()
lightgbm_est = get_equivalent_estimator(est, lib='lightgbm')
lightgbm_est.fit(data_train, target_train)
toc = time()
predicted_test = lightgbm_est.predict(data_test)
predicted_proba_test = lightgbm_est.predict_proba(data_test)
roc_auc = roc_auc_score(target_test, predicted_proba_test[:, 1])
acc = accuracy_score(target_test, predicted_test)
print(f"done in {toc - tic:.3f}s, ROC AUC: {roc_auc:.4f}, ACC: {acc :.4f}")
est = get_equivalent_estimator(est, lib='lightgbm')
fit(est, data_train, target_train, 'lightgbm')
predict(est, data_test, target_test)

if args.xgboost:
print("Fitting an XGBoost model...")
tic = time()
xgboost_est = get_equivalent_estimator(est, lib='xgboost')
xgboost_est.fit(data_train, target_train)
toc = time()
predicted_test = xgboost_est.predict(data_test)
predicted_proba_test = xgboost_est.predict_proba(data_test)
roc_auc = roc_auc_score(target_test, predicted_proba_test[:, 1])
acc = accuracy_score(target_test, predicted_test)
print(f"done in {toc - tic:.3f}s, ROC AUC: {roc_auc:.4f}, ACC: {acc :.4f}")
est = get_equivalent_estimator(est, lib='xgboost')
fit(est, data_train, target_train, 'xgboost')
predict(est, data_test, target_test)

if args.catboost:
print("Fitting a Catboost model...")
tic = time()
catboost_est = get_equivalent_estimator(est, lib='catboost')
catboost_est.fit(data_train, target_train)
toc = time()
predicted_test = catboost_est.predict(data_test)
predicted_proba_test = catboost_est.predict_proba(data_test)
roc_auc = roc_auc_score(target_test, predicted_proba_test[:, 1])
acc = accuracy_score(target_test, predicted_test)
print(f"done in {toc - tic:.3f}s, ROC AUC: {roc_auc:.4f}, ACC: {acc :.4f}")
est = get_equivalent_estimator(est, lib='catboost')
fit(est, data_train, target_train, 'catboost')
predict(est, data_test, target_test)
4 changes: 4 additions & 0 deletions build_tools/azure/install.sh
Expand Up @@ -97,6 +97,10 @@ elif [[ "$DISTRIB" == "conda-pip-latest" ]]; then
make_conda "python=$PYTHON_VERSION"
python -m pip install -U pip
python -m pip install pytest==$PYTEST_VERSION pytest-cov pytest-xdist

# TODO: Remove pin when https://github.com/python-pillow/Pillow/issues/4518 gets fixed
python -m pip install "pillow>=4.3.0,!=7.1.0,!=7.1.1"

python -m pip install pandas matplotlib pyamg scikit-image
# do not install dependencies for lightgbm since it requires scikit-learn
python -m pip install lightgbm --no-deps
Expand Down
5 changes: 5 additions & 0 deletions conftest.py
Expand Up @@ -87,6 +87,11 @@ def pytest_collection_modifyitems(config, items):
def pytest_configure(config):
import sys
sys._is_pytest_session = True
# declare our custom markers to avoid PytestUnknownMarkWarning
config.addinivalue_line(
"markers",
"network: mark a test for execution if network available."
)


def pytest_unconfigure(config):
Expand Down
4 changes: 2 additions & 2 deletions doc/about.rst
Expand Up @@ -13,7 +13,7 @@ this project as part of his thesis.
In 2010 Fabian Pedregosa, Gael Varoquaux, Alexandre Gramfort and Vincent
Michel of INRIA took leadership of the project and made the first public
release, February the 1st 2010. Since then, several releases have appeared
following a ~3 month cycle, and a thriving international community has
following a ~ 3-month cycle, and a thriving international community has
been leading the development.

Governance
Expand Down Expand Up @@ -520,7 +520,7 @@ budget of the project [#f1]_.
.. rubric:: Notes

.. [#f1] Regarding the organization budget in particular, we might use some of
.. [#f1] Regarding the organization budget, in particular, we might use some of
the donated funds to pay for other project expenses such as DNS,
hosting or continuous integration services.
Expand Down
42 changes: 30 additions & 12 deletions doc/developers/contributing.rst
Expand Up @@ -181,12 +181,12 @@ Contributing code
If in doubt about duplicated work, or if you want to work on a non-trivial
feature, it's recommended to first open an issue in
the `issue tracker <https://github.com/scikit-learn/scikit-learn/issues>`_
to get some feedbacks from core developers.
One easy way to find an issue to work on is by applying the "help wanted"
label in your search. This lists all the issues that have been unclaimed
so far. In order to claim an issue for yourself, please comment exactly
``take`` on it for the CI to automatically assign the issue to you.
to get some feedbacks from core developers.

One easy way to find an issue to work on is by applying the "help wanted"
label in your search. This lists all the issues that have been unclaimed
so far. In order to claim an issue for yourself, please comment exactly
``take`` on it for the CI to automatically assign the issue to you.

How to contribute
-----------------
Expand Down Expand Up @@ -215,7 +215,7 @@ how to set up your git repository:

4. Install the development dependencies::

$ pip install cython pytest pytest-cov flake8
$ pip install cython pytest pytest-cov flake8 mypy

5. Install scikit-learn in editable mode::

Expand All @@ -224,6 +224,8 @@ how to set up your git repository:
for more details about advanced installation, see the
:ref:`install_bleeding_edge` section.

.. _upstream:

6. Add the ``upstream`` remote. This saves a reference to the main
scikit-learn repository, which you can use to keep your repository
synchronized with the latest changes::
Expand Down Expand Up @@ -356,13 +358,17 @@ complies with the following rules before marking a PR as ``[MRG]``. The
non-regression tests should fail for the code base in the master branch
and pass for the PR code.

5. **Make sure that your PR does not add PEP8 violations**. On a Unix-like
system, you can run `make flake8-diff`. `flake8 path_to_file`, would work
for any system, but please avoid reformatting parts of the file that your
pull request doesn't change, as it distracts from code review.
5. **Make sure that your PR does not add PEP8 violations**. To check the
code that you changed, you can run the following command (see
:ref:`above <upstream>` to set up the upstream remote)::

git diff upstream/master -u -- "*.py" | flake8 --diff

or `make flake8-diff` which should work on unix-like system.

6. Follow the :ref:`coding-guidelines`.


7. When applicable, use the validation tools and scripts in the
``sklearn.utils`` submodule. A list of utility routines available
for developers can be found in the :ref:`developers-utils` page.
Expand Down Expand Up @@ -408,6 +414,18 @@ You can check for common programming errors with the following tools:

see also :ref:`testing_coverage`

* A moderate use of type annotations is encouraged but is not mandatory. See
[mypy quickstart](https://mypy.readthedocs.io/en/latest/getting_started.html)
for an introduction, as well as [pandas contributing documentation](
https://pandas.pydata.org/pandas-docs/stable/development/contributing.html#type-hints)
for style guidelines. Whether you add type annotation or not::

mypy --ignore-missing-import sklearn

must not produce new errors in your pull request. Using `# type: ignore` annotation can be a workaround for a few cases that are not supported by mypy, in particular,
- when importing C or Cython modules
- on properties with decorators

Bonus points for contributions that include a performance analysis with
a benchmark script and profiling output (please report on the mailing
list or on the GitHub issue).
Expand Down Expand Up @@ -662,7 +680,7 @@ In general have the following in mind:
4. 1D or 2D data can be a subset of
``{array-like, ndarray, sparse matrix, dataframe}``. Note that ``array-like``
can also be a ``list``, while ``ndarray`` is explicitly only a ``numpy.ndarray``.
5. When specifying the data type of a list, use ``of`` as a delimiter:
5. When specifying the data type of a list, use ``of`` as a delimiter:
``list of int``.
6. When specifying the dtype of an ndarray, use e.g. ``dtype=np.int32``
after defining the shape:
Expand Down
8 changes: 8 additions & 0 deletions doc/developers/maintainer.rst
Expand Up @@ -289,6 +289,14 @@ submodule/subpackage of the public subpackage, e.g.
``sklearn/impute/_iterative.py``. This is needed so that pickles still work
in the future when the features aren't experimental anymore

To avoid type checker (e.g. mypy) errors a direct import of experimenal
estimators should be done in the parent module, protected by the
``if typing.TYPE_CHECKING`` check. See `sklearn/ensemble/__init__.py
<https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/ensemble/__init__.py>`_,
or `sklearn/impute/__init__.py
<https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/impute/__init__.py>`_
for an example.

Please also write basic tests following those in
`test_enable_hist_gradient_boosting.py
<https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/experimental/tests/test_enable_hist_gradient_boosting.py>`_.
Expand Down

0 comments on commit 3b79637

Please sign in to comment.