Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[oneDAL] build warnings fix #154

Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
38 changes: 19 additions & 19 deletions source/elements/oneDAL/source/data_management/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -56,7 +56,7 @@ Dataset
--------

The main data-related concept that |dal_short_name| works with is a
:term:`dataset`. It is an in-memory or out-of-memory tabular view of data, where
:term:`Dataset`. It is an in-memory or out-of-memory tabular view of data, where
table rows represent the :term:`observations <Observation>` and columns
represent the :term:`features <Feature>`.

Expand All @@ -82,7 +82,7 @@ example:
Data source
-----------

Data source is a concept of an out-of-memory storage for a :term:`dataset`. It is
Data source is a concept of an out-of-memory storage for a :term:`Dataset`. It is
used at the data acquisition and data preparation stages for the following:

- To extract datasets from external sources such as databases, files, remote
Expand All @@ -92,11 +92,11 @@ used at the data acquisition and data preparation stages for the following:
the local memory, especially when processing with accelerators. A data source
provides the ability to load data by batches and extracts it directly into the
device's local memory. Therefore, a data source enables complex data analytics
scenarios, such as :term:`online computations <Online mode>`.
scenarios, such as :term:`online computations <Online Mode>`.

- To filter and normalize :term:`feature` values that are being extracted.
- To filter and normalize :term:`Feature` values that are being extracted.

- To recover missing :term:`feature` values.
- To recover missing :term:`Feature` values.

- To detect :term:`outliers <Outlier>` and recover the abnormal data.

Expand All @@ -112,7 +112,7 @@ For details, see :ref:`data-sources` section.
Table
-----

Table is a concept of a :term:`dataset` with in-memory numerical data. It is
Table is a concept of a :term:`Dataset` with in-memory numerical data. It is
used at the data preparation and data processing stages for the following:

- To store heterogeneous in-memory data in various
Expand All @@ -130,19 +130,19 @@ used at the data preparation and data processing stages for the following:

- To support streaming of the data to the algorithm.

- To access the underlying data on a device in a required :term:`data format`,
e.g. by blocks with the defined :term:`data layout`.
- To access the underlying data on a device in a required :term:`Data format`,
e.g. by blocks with the defined :term:`Data layout`.

For thread-safety reasons and better integration with external entities, a table
provides a read-only access to the data within it, thus, table concept
implementations shall be :term:`immutable <Immutability>`.

This concept has different logical organization and physical
:term:`format of the data <data format>`:
:term:`format of the data <Data format>`:

- Logically, a table is a :ref:`dataset` with :math:`n` rows and
:math:`p` columns. Each row represents an :term:`observation` and each column
is a :term:`feature` of a dataset. Physical amount of bytes needed to store
:math:`p` columns. Each row represents an :term:`Observation` and each column
is a :term:`Feature` of a dataset. Physical amount of bytes needed to store
the data differ from the number of elements :math:`n \times p` within
a table.

Expand All @@ -169,24 +169,24 @@ For each dataset, its metadata shall contain:

- The number of rows :math:`n` and columns :math:`p` in a dataset.

- The type of each :term:`feature` (e.g. :term:`nominal <Nominal feature>`,
- The type of each :term:`Feature` (e.g. :term:`nominal <Nominal feature>`,
:term:`interval <Interval feature>`).

- The :term:`data type` of each feature (e.g. :code:`float` or :code:`double`).
- The :term:`Data type` of each feature (e.g. :code:`float` or :code:`double`).

.. note::
Metadata can contain both compile-time and run-time information. For example,
basic compile-time metadata is the type of a dataset - whether it is a
particular :ref:`data-source` or a :ref:`table`. Run-time information can
contain the :term:`feature` types and :term:`data types <Data type>` of a
contain the :term:`Feature` types and :term:`data types <Data type>` of a
dataset.

.. _table-builder:

Table builder
-------------

A table :term:`builder` is a concept that is associated with a particular
A table :term:`Builder` is a concept that is associated with a particular
:ref:`table` type and is used at the data preparation and data processing stages
for:

Expand Down Expand Up @@ -226,13 +226,13 @@ in-memory numerical :ref:`dataset`. It allows:
- To convert a variety of numeric :term:`data formats <Data format>` into a
smaller set of formats.

- To provide a :term:`flat <flat data>` view on the data blocks of a
- To provide a :term:`flat <Flat data>` view on the data blocks of a
:ref:`dataset` for better a data locality. For example, some accessor
implementation returns :term:`feature` values as a contiguous array, while the
implementation returns :term:`Feature` values as a contiguous array, while the
original dataset stored row-by-row (there are strides between values of a
single feature).

- To acquire data in a desired :term:`data format` for which
- To acquire data in a desired :term:`Data format` for which
a specific set of operations is defined.

- To have read-only, read-write and write-only access to the data. Accessor
Expand Down Expand Up @@ -282,7 +282,7 @@ within some table, a builder object can be constructed for this. Data inside a
table builder can be retrieved by read-only, write-only or read-write accessors.

Accessors shown on the diagram allow to get data from tables and table builders
as :term:`flat <flat data>` blocks of rows.
as :term:`flat <Flat data>` blocks of rows.

Details
=======
Expand Down
98 changes: 49 additions & 49 deletions source/elements/oneDAL/source/glossary.rst
Original file line number Diff line number Diff line change
Expand Up @@ -11,13 +11,13 @@ Machine learning terms
:sorted:

Categorical feature
A :term:`feature` with a discrete domain. Can be :term:`nominal <Nominal
A :term:`Feature` with a discrete domain. Can be :term:`nominal <Nominal
feature>` or :term:`ordinal <Ordinal feature>`.

**Synonyms:** discrete feature, qualitative feature

Nominal feature
A :term:`categorical feature` without ordering between values. Only
A :term:`Categorical feature` without ordering between values. Only
equality operation is defined for nominal features.

**Examples:** a person's gender, color of a car
Expand All @@ -38,7 +38,7 @@ Machine learning terms
**Example:** find big star clusters in the space images

Continuous feature
A :term:`feature` with values in a domain of real numbers. Can be
A :term:`Feature` with values in a domain of real numbers. Can be
:term:`interval <Interval feature>` or :term:`ratio <Ratio feature>`

**Synonyms:** quantitative feature, numerical feature
Expand All @@ -57,51 +57,51 @@ Machine learning terms

Feature vector
A vector that encodes information about real object, an event or a group
of objects or events. Contains at least one :term:`feature`.
of objects or events. Contains at least one :term:`Feature`.

**Example:** A rectangle can be described by two features: its width and
height

Inference
A process of applying a :term:`trained <Training>` :term:`model` to the
:term:`dataset` in order to predict :term:`response`
A process of applying a :term:`trained <Training>` :term:`Model` to the
:term:`Dataset` in order to predict :term:`Response`
values based on input :term:`feature vectors <Feature vector>`.

**Synonym:** prediction

Inference set
A :term:`dataset` used at the :term:`inference` stage.
A :term:`Dataset` used at the :term:`Inference` stage.
Usually without :term:`responses <Response>`.

Interval feature
A :term:`continuous feature` with values that can be compared, added or
A :term:`Continuous feature` with values that can be compared, added or
subtracted, but cannot be multiplied or divided.

**Examples:** a timeframe scale, a temperature in Celcius or Fahrenheit

Label
A :term:`response` with :term:`categorical <Categorical feature>` or
A :term:`Response` with :term:`categorical <Categorical feature>` or
:term:`ordinal <Ordinal feature>` values. This is an output in
:term:`classification` and :term:`clustering` problems.
:term:`Classification` and :term:`Clustering` problems.

**Example:** the spam-detection problem has a binary label indicating
whether the email is spam or not

Model
An entity that stores information necessary to run :term:`inference`
on a new :term:`dataset`. Typically a result of a :term:`training`
An entity that stores information necessary to run :term:`Inference`
on a new :term:`Dataset`. Typically a result of a :term:`Training`
process.

**Example:** in linear regression algorithm, the model contains weight
values for each input feature and a single bias value

Observation
A :term:`feature vector` and zero or more :term:`responses<Response>`.
A :term:`Feature vector` and zero or more :term:`responses<Response>`.

**Synonyms:** instance, sample

Ordinal feature
A :term:`categorical feature` with defined operations of equality and
A :term:`Categorical feature` with defined operations of equality and
ordering between values.

**Example:** student's grade
Expand All @@ -111,46 +111,46 @@ Machine learning terms
observations.

Ratio feature
A :term:`continuous feature` with defined operations of equality,
A :term:`Continuous feature` with defined operations of equality,
comparison, addition, subtraction, multiplication, and division.
Zero value element means the absence of any value.

**Example:** the height of a tower

Regression
A :term:`supervised machine learning problem <Supervised learning>` of
assigning :term:`continuous <Continuous feature>`
:term:`responses<Response>` for :term:`feature vectors <Feature vector>`.
assigning :term:`continuous <Continuous feature>` :term:`responses
<Response>` for :term:`feature vectors <Feature vector>`.

**Example:** predict temperature based on weather conditions

Response
A property of some real object or event which dependency from
:term:`feature vector` need to be defined in :term:`supervised learning`
problem. While a :term:`feature` is an input in the machine learning
:term:`Feature vector` need to be defined in :term:`Supervised learning`
problem. While a :term:`Feature` is an input in the machine learning
problem, the response is one of the outputs can be made by the
:term:`model` on the :term:`inference` stage.
:term:`Model` on the :term:`Inference` stage.

**Synonym:** dependent variable

Supervised learning
:term:`Training` process that uses a :term:`dataset` with information
:term:`Training` process that uses a :term:`Dataset` with information
about dependencies between :term:`features <Feature>` and
:term:`responses <Response>`. The goal is to get a :term:`model` of
dependencies between input :term:`feature vector` and
:term:`responses <Response>`. The goal is to get a :term:`Model` of
dependencies between input :term:`Feature vector` and
:term:`responses <Response>`.

Training
A process of creating a :term:`model` based on information extracted
from a :term:`training set`. Resulting :term:`model` is selected in
A process of creating a :term:`Model` based on information extracted
from a :term:`Training set`. Resulting :term:`Model` is selected in
accordance with some quality criteria.

Training set
A :term:`dataset` used at the :term:`training` stage to create a
:term:`model`.
A :term:`Dataset` used at the :term:`Training` stage to create a
:term:`Model`.

Unsupervised learning
:term:`Training` process that uses a :term:`training set` with no
:term:`Training` process that uses a :term:`Training set` with no
:term:`responses <Response>`. The goal is to find hidden patters inside
:term:`feature vectors <Feature vector>` and dependencies between them.

Expand All @@ -162,7 +162,7 @@ Machine learning terms

Accessor
A |dal_short_name| concept for an object that provides access to the
data of another object in the special :term:`data format`. It abstracts
data of another object in the special :term:`Data format`. It abstracts
data access from interface of an object and provides uniform access to
the data stored in objects of different types.

Expand All @@ -177,7 +177,7 @@ Machine learning terms

Contiguous data
Data that are stored as one contiguous memory block. One of the
characteristics of a :term:`data format`.
characteristics of a :term:`Data format`.

Data format
Representation of the internal structure of the data.
Expand All @@ -186,8 +186,8 @@ Machine learning terms
compressed-sparse-row format

Data layout
A characteristic of :term:`data format` which describes the
order of elements in a :term:`contiguous data` block.
A characteristic of :term:`Data format` which describes the
order of elements in a :term:`Contiguous data` block.

**Example:** row-major format, where elements are stored row by row

Expand All @@ -199,8 +199,8 @@ Machine learning terms
**Examples:** ``int32_t``, ``float``, ``double``

Flat data
A block of :term:`contiguous <contiguous data>` :term:`homogeneous
<homogeneous data>` data.
A block of :term:`contiguous <Contiguous data>` :term:`homogeneous
<Homogeneous data>` data.

Getter
A method that returns the value of the private member variable.
Expand All @@ -215,21 +215,21 @@ Machine learning terms
Heterogeneous data
Data which contain values either of different :term:`data types <Data
type>` or different sets of operations defined on them. One of the
characteristics of a :term:`data format`.
characteristics of a :term:`Data format`.

**Example:** A :term:`dataset` with 100
:term:`observations <Observation>` of three :term:`interval features <Interval
feature>`. The first two features are of float32 :term:`data type`, while the
third one is of float64 data type.
**Example:** A :term:`Dataset` with 100 :term:`observations
<Observation>` of three :term:`interval features <Interval feature>`.
The first two features are of float32 :term:`Data type`, while the third
one is of float64 data type.

Homogeneous data
Data with values of single :term:`data type` and the same set of
Data with values of single :term:`Data type` and the same set of
available operations defined on them. One of the characteristics of a
:term:`data format`.
:term:`Data format`.

**Example:** A :term:`dataset` with 100
:term:`observations <Observation>` of three :term:`interval features <Interval
feature>`, each of type float32
**Example:** A :term:`Dataset` with 100 :term:`observations
<Observation>` of three :term:`interval features <Interval feature>`,
each of type float32

Immutability
The object is immutable if it is not possible to change its state after
Expand All @@ -241,7 +241,7 @@ Machine learning terms
possible objects of a given type. Metadata do not expose information
that is not a part of a type definition, e.g. implementation details.

**Example:** :term:`table` object can contain three :term:`nominal features
**Example:** :term:`Table` object can contain three :term:`nominal features
<Nominal feature>` with 100 :term:`observations <Observation>` (logical
part of metadata). This object can store data as sparse csr array and
provides direct access to them (physical part)
Expand Down Expand Up @@ -270,14 +270,14 @@ Machine learning terms


Table
A |dal_short_name| concept for a :term:`dataset` that contains only
A |dal_short_name| concept for a :term:`Dataset` that contains only
numerical data, :term:`categorical <Categorical feature>` or
:term:`continuous <Continuous feature>`. Serves as a transfer of data
between user's application and computations inside |dal_short_name|.
Hides details of :term:`data format` and generalizes access to the data.
Hides details of :term:`Data format` and generalizes access to the data.

Workload
A problem of applying a |dal_short_name| algorithm to a :term:`dataset`.
A problem of applying a |dal_short_name| algorithm to a :term:`Dataset`.

Common oneAPI terms
===================
Expand All @@ -291,7 +291,7 @@ Common oneAPI terms
DPC++
Data Parallel C++ (DPC++) is a high-level language designed for data
parallel programming productivity. DPC++ is based on :term:`SYCL*
<sycl>` from the Khronos* Group to support data parallelism and
<SYCL>` from the Khronos* Group to support data parallelism and
heterogeneous programming.

Host/Device
Expand Down
6 changes: 3 additions & 3 deletions source/elements/oneDAL/source/introduction.rst
Original file line number Diff line number Diff line change
Expand Up @@ -48,11 +48,11 @@ communication technology and, therefore, can be used within different end-to-end
- Prediction algorithms typically work with the trained data model and with a working data set.

- The **Utilities** component includes auxiliary functionality intended to be used for design of
classes and implementation of methods such as memory allocators or type traits.
classes and implementation of methods such as memory allocators or type traits.

- The **Miscellaneous** component includes functionality intended to be used by |dal_short_name|
algorithms and applications for algorithm customization and optimization on various stages of the
analytical pipeline. Examples of such algorithms include solvers and random number generators.
algorithms and applications for algorithm customization and optimization on various stages of the
analytical pipeline. Examples of such algorithms include solvers and random number generators.

Classes in Data Management, Algorithms, Utilities, and Miscellaneous components cover the most
important usage scenarios and allow seamless implementation of complex data analytics workflows
Expand Down