diff --git a/doc/modules/partial_dependence.rst b/doc/modules/partial_dependence.rst
index 2d3603cc79095..92a44c0640f98 100644
--- a/doc/modules/partial_dependence.rst
+++ b/doc/modules/partial_dependence.rst
@@ -25,34 +25,33 @@ of all other input features (the 'complement' features). Intuitively, we can
 interpret the partial dependence as the expected target response as a
 function of the input features of interest.
 
-Due to the limits of human perception the size of the set of input feature of
+Due to the limits of human perception, the size of the set of input features of
 interest must be small (usually, one or two) thus the input features of interest
 are usually chosen among the most important features.
 
 The figure below shows two one-way and one two-way partial dependence plots for
-the California housing dataset, with a :class:`HistGradientBoostingRegressor
-<sklearn.ensemble.HistGradientBoostingRegressor>`:
+the bike sharing dataset, with a
+:class:`~sklearn.ensemble.HistGradientBoostingRegressor`:
 
-.. figure:: ../auto_examples/inspection/images/sphx_glr_plot_partial_dependence_003.png
+.. figure:: ../auto_examples/inspection/images/sphx_glr_plot_partial_dependence_005.png
    :target: ../auto_examples/inspection/plot_partial_dependence.html
    :align: center
    :scale: 70
 
-One-way PDPs tell us about the interaction between the target response and an
-input feature of interest feature (e.g. linear, non-linear). The left plot
-in the above figure shows the effect of the average occupancy on the median
-house price; we can clearly see a linear relationship among them when the
-average occupancy is inferior to 3 persons. Similarly, we could analyze the
-effect of the house age on the median house price (middle plot). Thus, these
-interpretations are marginal, considering a feature at a time.
-
-PDPs with two input features of interest show the interactions among the two
-features. For example, the two-variable PDP in the above figure shows the
-dependence of median house price on joint values of house age and average
-occupants per household. We can clearly see an interaction between the two
-features: for an average occupancy greater than two, the house price is nearly
-independent of the house age, whereas for values less than 2 there is a strong
-dependence on age.
+One-way PDPs tell us about the interaction between the target response and an input
+feature of interest (e.g. linear, non-linear). The left plot in the above figure
+shows the effect of the temperature on the number of bike rentals; we can clearly see
+that a higher temperature is related with a higher number of bike rentals. Similarly, we
+could analyze the effect of the humidity on the number of bike rentals (middle plot).
+Thus, these interpretations are marginal, considering a feature at a time.
+
+PDPs with two input features of interest show the interactions among the two features.
+For example, the two-variable PDP in the above figure shows the dependence of the number
+of bike rentals on joint values of temperature and humidity. We can clearly see an
+interaction between the two features: with a temperature higher than 20 degrees Celsius,
+mainly the humidity has a strong impact on the number of bike rentals. For lower
+temperatures, both the temperature and the humidity have an impact on the number of bike
+rentals.
 
 The :mod:`sklearn.inspection` module provides a convenience function
 :func:`~PartialDependenceDisplay.from_estimator` to create one-way and two-way partial
@@ -74,6 +73,12 @@ and a two-way PDP between the two features::
 You can access the newly created figure and Axes objects using ``plt.gcf()``
 and ``plt.gca()``.
 
+To make a partial dependence plot with categorical features, you need to specify
+which features are categorical using the parameter `categorical_features`. This
+parameter takes a list of indices, names of the categorical features or a boolean
+mask. The graphical representation of partial dependence for categorical features is
+a bar plot or a 2D heatmap.
+
 For multi-class classification, you need to set the class label for which
 the PDPs should be created via the ``target`` argument::
 
@@ -120,12 +125,11 @@ feature for each sample separately with one line per sample.
 Due to the limits of human perception, only one input feature of interest is
 supported for ICE plots.
 
-The figures below show four ICE plots for the California housing dataset,
-with a :class:`HistGradientBoostingRegressor
-<sklearn.ensemble.HistGradientBoostingRegressor>`. The second figure plots
-the corresponding PD line overlaid on ICE lines.
+The figures below show two ICE plots for the bike sharing dataset,
+with a :class:`~sklearn.ensemble.HistGradientBoostingRegressor`:.
+The figures plot the corresponding PD line overlaid on ICE lines.
 
-.. figure:: ../auto_examples/inspection/images/sphx_glr_plot_partial_dependence_002.png
+.. figure:: ../auto_examples/inspection/images/sphx_glr_plot_partial_dependence_004.png
    :target: ../auto_examples/inspection/plot_partial_dependence.html
    :align: center
    :scale: 70
@@ -133,10 +137,11 @@ the corresponding PD line overlaid on ICE lines.
 While the PDPs are good at showing the average effect of the target features,
 they can obscure a heterogeneous relationship created by interactions.
 When interactions are present the ICE plot will provide many more insights.
-For example, we could observe a linear relationship between the median income
-and the house price in the PD line. However, the ICE lines show that there
-are some exceptions, where the house price remains constant in some ranges of
-the median income.
+For example, we see that the ICE for the temperature feature gives us some
+additional information: Some of the ICE lines are flat while some others
+shows a decrease of the dependence for temperature above 35 degrees Celsius.
+We observe a similar pattern for the humidity feature: some of the ICE
+lines show a sharp decrease when the humidity is above 80%.
 
 The :mod:`sklearn.inspection` module's :meth:`PartialDependenceDisplay.from_estimator`
 convenience function can be used to create ICE plots by setting
diff --git a/doc/whats_new/v1.2.rst b/doc/whats_new/v1.2.rst
index 5b9c046758ede..bc7aaffc6e932 100644
--- a/doc/whats_new/v1.2.rst
+++ b/doc/whats_new/v1.2.rst
@@ -405,6 +405,14 @@ Changelog
   containing only missing values when transforming.
   :pr:`16695` by :user:`Vitor Santa Rosa <vitorsrg>`.
 
+:mod:`sklearn.inspection`
+.........................
+
+- |Enhancement| Extended :func:`inspection.partial_dependence` and
+  :class:`inspection.PartialDependenceDisplay` to handle categorical features.
+  :pr:`18298` by :user:`Madhura Jayaratne <madhuracj>` and
+  :user:`Guillaume Lemaitre <glemaitre>`.
+
 :mod:`sklearn.kernel_approximation`
 ...................................
 
@@ -416,7 +424,7 @@ Changelog
 
 - |Enhancement| :class:`kernel_approximation.RBFSampler` now accepts
   `'scale'` option for parameter `gamma`.
-  :pr:`24755` by :user:`Gleb Levitski <GLevV>`
+  :pr:`24755` by :user:`Gleb Levitski <GLevV>`.
 
 :mod:`sklearn.linear_model`
 ...........................
diff --git a/examples/inspection/plot_partial_dependence.py b/examples/inspection/plot_partial_dependence.py
index 5b85e6edbc151..ca4e040c7a3a5 100644
--- a/examples/inspection/plot_partial_dependence.py
+++ b/examples/inspection/plot_partial_dependence.py
@@ -19,10 +19,11 @@
 This example shows how to obtain partial dependence and ICE plots from a
 :class:`~sklearn.neural_network.MLPRegressor` and a
 :class:`~sklearn.ensemble.HistGradientBoostingRegressor` trained on the
-California housing dataset. The example is taken from [1]_.
+bike sharing dataset. The example is inspired by [1]_.
 
-.. [1] T. Hastie, R. Tibshirani and J. Friedman, "Elements of Statistical
-       Learning Ed. 2", Springer, 2009.
+.. [1] `Molnar, Christoph. "Interpretable machine learning.
+       A Guide for Making Black Box Models Explainable",
+       2019. <https://christophm.github.io/interpretable-ml-book/>`_
 
 .. [2] For classification you can think of it as the regression score before
        the link function.
@@ -31,28 +32,156 @@
        "Peeking Inside the Black Box: Visualizing Statistical Learning With Plots of
        Individual Conditional Expectation". Journal of Computational and
        Graphical Statistics, 24(1): 44-65 <1309.6392>`
-
 """
 
 # %%
-# California Housing data preprocessing
-# -------------------------------------
+# Bike sharing dataset preprocessing
+# ----------------------------------
 #
-# Center target to avoid gradient boosting init bias: gradient boosting
-# with the 'recursion' method does not account for the initial estimator
-# (here the average target, by default).
+# We will use the bike sharing dataset. The goal is to predict the number of bike
+# rentals using weather and season data as well as the datetime information.
+from sklearn.datasets import fetch_openml
+
+bikes = fetch_openml("Bike_Sharing_Demand", version=2, as_frame=True, parser="pandas")
+# Make an explicit copy to avoid "SettingWithCopyWarning" from pandas
+X, y = bikes.data.copy(), bikes.target
+
+# %%
+# The feature `"weather"` has a particularity: the category `"heavy_rain"` is a rare
+# category.
+X["weather"].value_counts()
 
-import pandas as pd
-from sklearn.datasets import fetch_california_housing
-from sklearn.model_selection import train_test_split
+# %%
+# Because of this rare category, we collapse it into `"rain"`.
+X["weather"].replace(to_replace="heavy_rain", value="rain", inplace=True)
+
+# %%
+# We now have a closer look at the `"year"` feature:
+X["year"].value_counts()
 
-cal_housing = fetch_california_housing()
-X = pd.DataFrame(cal_housing.data, columns=cal_housing.feature_names)
-y = cal_housing.target
+# %%
+# We see that we have data from two years. We use the first year to train the
+# model and the second year to test the model.
+mask_training = X["year"] == 0.0
+X = X.drop(columns=["year"])
+X_train, y_train = X[mask_training], y[mask_training]
+X_test, y_test = X[~mask_training], y[~mask_training]
+
+# %%
+# We can check the dataset information to see that we have heterogeneous data types. We
+# have to preprocess the different columns accordingly.
+X_train.info()
+
+# %%
+# From the previous information, we will consider the `category` columns as nominal
+# categorical features. In addition, we will consider the date and time information as
+# categorical features as well.
+#
+# We manually define the columns containing numerical and categorical
+# features.
+numerical_features = [
+    "temp",
+    "feel_temp",
+    "humidity",
+    "windspeed",
+]
+categorical_features = X_train.columns.drop(numerical_features)
+
+# %%
+# Before we go into the details regarding the preprocessing of the different machine
+# learning pipelines, we will try to get some additional intuition regarding the dataset
+# that will be helpful to understand the model's statistical performance and results of
+# the partial dependence analysis.
+#
+# We plot the average number of bike rentals by grouping the data by season and
+# by year.
+from itertools import product
+import numpy as np
+import matplotlib.pyplot as plt
+
+days = ("Sun", "Mon", "Tue", "Wed", "Thu", "Fri", "Sat")
+hours = tuple(range(24))
+xticklabels = [f"{day}\n{hour}:00" for day, hour in product(days, hours)]
+xtick_start, xtick_period = 6, 12
+
+fig, axs = plt.subplots(nrows=2, figsize=(8, 6), sharey=True, sharex=True)
+average_bike_rentals = bikes.frame.groupby(["year", "season", "weekday", "hour"]).mean(
+    numeric_only=True
+)["count"]
+for ax, (idx, df) in zip(axs, average_bike_rentals.groupby("year")):
+    df.groupby("season").plot(ax=ax, legend=True)
+
+    # decorate the plot
+    ax.set_xticks(
+        np.linspace(
+            start=xtick_start,
+            stop=len(xticklabels),
+            num=len(xticklabels) // xtick_period,
+        )
+    )
+    ax.set_xticklabels(xticklabels[xtick_start::xtick_period])
+    ax.set_xlabel("")
+    ax.set_ylabel("Average number of bike rentals")
+    ax.set_title(
+        f"Bike rental for {'2010 (train set)' if idx == 0.0 else '2011 (test set)'}"
+    )
+    ax.set_ylim(0, 1_000)
+    ax.set_xlim(0, len(xticklabels))
+    ax.legend(loc=2)
+
+# %%
+# The first striking difference between the train and test set is that the number of
+# bike rentals is higher in the test set. For this reason, it will not be surprising to
+# get a machine learning model that underestimates the number of bike rentals. We
+# also observe that the number of bike rentals is lower during the spring season. In
+# addition, we see that during working days, there is a specific pattern around 6-7
+# am and 5-6 pm with some peaks of bike rentals. We can keep in mind these different
+# insights and use them to understand the partial dependence plot.
+#
+# Preprocessor for machine-learning models
+# ----------------------------------------
+#
+# Since we later use two different models, a
+# :class:`~sklearn.neural_network.MLPRegressor` and a
+# :class:`~sklearn.ensemble.HistGradientBoostingRegressor`, we create two different
+# preprocessors, specific for each model.
+#
+# Preprocessor for the neural network model
+# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+#
+# We will use a :class:`~sklearn.preprocessing.QuantileTransformer` to scale the
+# numerical features and encode the categorical features with a
+# :class:`~sklearn.preprocessing.OneHotEncoder`.
+from sklearn.compose import ColumnTransformer
+from sklearn.preprocessing import QuantileTransformer
+from sklearn.preprocessing import OneHotEncoder
 
-y -= y.mean()
+mlp_preprocessor = ColumnTransformer(
+    transformers=[
+        ("num", QuantileTransformer(n_quantiles=100), numerical_features),
+        ("cat", OneHotEncoder(handle_unknown="ignore"), categorical_features),
+    ]
+)
+mlp_preprocessor
 
-X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.1, random_state=0)
+# %%
+# Preprocessor for the gradient boosting model
+# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+#
+# For the gradient boosting model, we leave the numerical features as-is and only
+# encode the categorical features using a
+# :class:`~sklearn.preprocessing.OrdinalEncoder`.
+from sklearn.preprocessing import OrdinalEncoder
+
+hgbdt_preprocessor = ColumnTransformer(
+    transformers=[
+        ("cat", OrdinalEncoder(), categorical_features),
+        ("num", "passthrough", numerical_features),
+    ],
+    sparse_threshold=1,
+    verbose_feature_names_out=False,
+).set_output(transform="pandas")
+hgbdt_preprocessor
 
 # %%
 # 1-way partial dependence with different models
@@ -60,25 +189,23 @@
 #
 # In this section, we will compute 1-way partial dependence with two different
 # machine-learning models: (i) a multi-layer perceptron and (ii) a
-# gradient-boosting. With these two models, we illustrate how to compute and
-# interpret both partial dependence plot (PDP) and individual conditional
-# expectation (ICE).
+# gradient-boosting model. With these two models, we illustrate how to compute and
+# interpret both partial dependence plot (PDP) for both numerical and categorical
+# features and individual conditional expectation (ICE).
 #
 # Multi-layer perceptron
-# ......................
+# """"""""""""""""""""""
 #
 # Let's fit a :class:`~sklearn.neural_network.MLPRegressor` and compute
 # single-variable partial dependence plots.
-
 from time import time
-from sklearn.pipeline import make_pipeline
-from sklearn.preprocessing import QuantileTransformer
 from sklearn.neural_network import MLPRegressor
+from sklearn.pipeline import make_pipeline
 
 print("Training MLPRegressor...")
 tic = time()
-est = make_pipeline(
-    QuantileTransformer(),
+mlp_model = make_pipeline(
+    mlp_preprocessor,
     MLPRegressor(
         hidden_layer_sizes=(30, 15),
         learning_rate_init=0.01,
@@ -86,14 +213,14 @@
         random_state=0,
     ),
 )
-est.fit(X_train, y_train)
+mlp_model.fit(X_train, y_train)
 print(f"done in {time() - tic:.3f}s")
-print(f"Test R2 score: {est.score(X_test, y_test):.2f}")
+print(f"Test R2 score: {mlp_model.score(X_test, y_test):.2f}")
 
 # %%
-# We configured a pipeline to scale the numerical input features and tuned the
-# neural network size and learning rate to get a reasonable compromise between
-# training time and predictive performance on a test set.
+# We configured a pipeline using the preprocessor that we created specifically for the
+# neural network and tuned the neural network size and learning rate to get a reasonable
+# compromise between training time and predictive performance on a test set.
 #
 # Importantly, this tabular dataset has very different dynamic ranges for its
 # features. Neural networks tend to be very sensitive to features with varying
@@ -106,52 +233,65 @@
 # Note that it is important to check that the model is accurate enough on a
 # test set before plotting the partial dependence since there would be little
 # use in explaining the impact of a given feature on the prediction function of
-# a poor model.
+# a model with poor predictive performance. In this regard, our MLP model works
+# reasonably well.
 #
-# We will plot the partial dependence, both individual (ICE) and averaged one
-# (PDP). We limit to only 50 ICE curves to not overcrowd the plot.
-
+# We will plot the averaged partial dependence.
+import matplotlib.pyplot as plt
 from sklearn.inspection import PartialDependenceDisplay
 
 common_params = {
     "subsample": 50,
     "n_jobs": 2,
     "grid_resolution": 20,
-    "centered": True,
     "random_state": 0,
 }
 
 print("Computing partial dependence plots...")
+features_info = {
+    # features of interest
+    "features": ["temp", "humidity", "windspeed", "season", "weather", "hour"],
+    # type of partial dependence plot
+    "kind": "average",
+    # information regarding categorical features
+    "categorical_features": categorical_features,
+}
 tic = time()
+_, ax = plt.subplots(ncols=3, nrows=2, figsize=(9, 8), constrained_layout=True)
 display = PartialDependenceDisplay.from_estimator(
-    est,
+    mlp_model,
     X_train,
-    features=["MedInc", "AveOccup", "HouseAge", "AveRooms"],
-    kind="both",
+    **features_info,
+    ax=ax,
     **common_params,
 )
 print(f"done in {time() - tic:.3f}s")
-display.figure_.suptitle(
-    "Partial dependence of house value on non-location features\n"
-    "for the California housing dataset, with MLPRegressor"
+_ = display.figure_.suptitle(
+    "Partial dependence of the number of bike rentals\n"
+    "for the bike rental dataset with an MLPRegressor",
+    fontsize=16,
 )
-display.figure_.subplots_adjust(hspace=0.3)
 
 # %%
 # Gradient boosting
-# .................
+# """""""""""""""""
 #
 # Let's now fit a :class:`~sklearn.ensemble.HistGradientBoostingRegressor` and
-# compute the partial dependence on the same features.
-
+# compute the partial dependence on the same features. We also use the
+# specific preprocessor we created for this model.
 from sklearn.ensemble import HistGradientBoostingRegressor
 
 print("Training HistGradientBoostingRegressor...")
 tic = time()
-est = HistGradientBoostingRegressor(random_state=0)
-est.fit(X_train, y_train)
+hgbdt_model = make_pipeline(
+    hgbdt_preprocessor,
+    HistGradientBoostingRegressor(
+        categorical_features=categorical_features, random_state=0
+    ),
+)
+hgbdt_model.fit(X_train, y_train)
 print(f"done in {time() - tic:.3f}s")
-print(f"Test R2 score: {est.score(X_test, y_test):.2f}")
+print(f"Test R2 score: {hgbdt_model.score(X_test, y_test):.2f}")
 
 # %%
 # Here, we used the default hyperparameters for the gradient boosting model
@@ -163,177 +303,219 @@
 # also significantly cheaper to tune their hyperparameters (the defaults tend
 # to work well while this is not often the case for neural networks).
 #
-# We will plot the partial dependence, both individual (ICE) and averaged one
-# (PDP). We limit to only 50 ICE curves to not overcrowd the plot.
-
+# We will plot the partial dependence for some of the numerical and categorical
+# features.
 print("Computing partial dependence plots...")
 tic = time()
+_, ax = plt.subplots(ncols=3, nrows=2, figsize=(9, 8), constrained_layout=True)
 display = PartialDependenceDisplay.from_estimator(
-    est,
+    hgbdt_model,
     X_train,
-    features=["MedInc", "AveOccup", "HouseAge", "AveRooms"],
-    kind="both",
+    **features_info,
+    ax=ax,
     **common_params,
 )
 print(f"done in {time() - tic:.3f}s")
-display.figure_.suptitle(
-    "Partial dependence of house value on non-location features\n"
-    "for the California housing dataset, with Gradient Boosting"
+_ = display.figure_.suptitle(
+    "Partial dependence of the number of bike rentals\n"
+    "for the bike rental dataset with a gradient boosting",
+    fontsize=16,
 )
-display.figure_.subplots_adjust(wspace=0.4, hspace=0.3)
 
 # %%
 # Analysis of the plots
-# .....................
-#
-# We can clearly see on the PDPs (dashed orange line) that the median house price
-# shows a linear relationship with the median income (top left) and that the
-# house price drops when the average occupants per household increases (top
-# middle). The top right plot shows that the house age in a district does not
-# have a strong influence on the (median) house price; so does the average
-# rooms per household.
-#
-# The ICE curves (light blue lines) complement the analysis: we can see that
-# there are some exceptions (which are better highlighted with the option
-# `centered=True`), where the house price remains constant with respect to
-# median income and average occupants variations.
-# On the other hand, while the house age (top right) does not have a strong
-# influence on the median house price on average, there seems to be a number
-# of exceptions where the house price increases when
-# between the ages 15-25. Similar exceptions can be observed for the average
-# number of rooms (bottom left). Therefore, ICE plots show some individual
-# effect which are attenuated by taking the averages.
-#
-# In all plots, the tick marks on the x-axis represent the deciles of the
-# feature values in the training data.
-#
-# We also observe that :class:`~sklearn.neural_network.MLPRegressor` has much
-# smoother predictions than
-# :class:`~sklearn.ensemble.HistGradientBoostingRegressor`.
+# """""""""""""""""""""
+#
+# We will first look at the PDPs for the numerical features. For both models, the
+# general trend of the PDP of the temperature is that the number of bike rentals is
+# increasing with temperature. We can make a similar analysis but with the opposite
+# trend for the humidity features. The number of bike rentals is decreasing when the
+# humidity increases. Finally, we see the same trend for the wind speed feature. The
+# number of bike rentals is decreasing when the wind speed is increasing for both
+# models. We also observe that :class:`~sklearn.neural_network.MLPRegressor` has much
+# smoother predictions than :class:`~sklearn.ensemble.HistGradientBoostingRegressor`.
+#
+# Now, we will look at the partial dependence plots for the categorical features.
+#
+# We observe that the spring season is the lowest bar for the season feature. With the
+# weather feature, the rain category is the lowest bar. Regarding the hour feature,
+# we see two peaks around the 7 am and 6 pm. These findings are in line with the
+# the observations we made earlier on the dataset.
 #
 # However, it is worth noting that we are creating potential meaningless
 # synthetic samples if features are correlated.
-
-# %%
-# 2D interaction plots
-# --------------------
 #
-# PDPs with two features of interest enable us to visualize interactions among
-# them. However, ICEs cannot be plotted in an easy manner and thus interpreted.
-# Another consideration is linked to the performance to compute the PDPs. With
-# the tree-based algorithm, when only PDPs are requested, they can be computed
-# on an efficient way using the `'recursion'` method.
-import matplotlib.pyplot as plt
-
-print("Computing partial dependence plots...")
+# ICE vs. PDP
+# """""""""""
+# PDP is an average of the marginal effects of the features. We are averaging the
+# response of all samples of the provided set. Thus, some effects could be hidden. In
+# this regard, it is possible to plot each individual response. This representation is
+# called the Individual Effect Plot (ICE). In the plot below, we plot 50 randomly
+# selected ICEs for the temperature and humidity features.
+print("Computing partial dependence plots and individual conditional expectation...")
 tic = time()
-_, ax = plt.subplots(ncols=3, figsize=(9, 4))
+_, ax = plt.subplots(ncols=2, figsize=(6, 4), sharey=True, constrained_layout=True)
+
+features_info = {
+    "features": ["temp", "humidity"],
+    "kind": "both",
+    "centered": True,
+}
 
-# Note that we could have called the method `from_estimator` three times and
-# provide one feature, one kind of plot, and one axis for each call.
 display = PartialDependenceDisplay.from_estimator(
-    est,
+    hgbdt_model,
     X_train,
-    features=["AveOccup", "HouseAge", ("AveOccup", "HouseAge")],
-    kind=["both", "both", "average"],
+    **features_info,
     ax=ax,
     **common_params,
 )
-
 print(f"done in {time() - tic:.3f}s")
-display.figure_.suptitle(
-    "Partial dependence of house value on non-location features\n"
-    "for the California housing dataset, with Gradient Boosting"
+_ = display.figure_.suptitle("ICE and PDP representations", fontsize=16)
+
+# %%
+# We see that the ICE for the temperature feature gives us some additional information:
+# Some of the ICE lines are flat while some others show a decrease of the dependence
+# for temperature above 35 degrees Celsius. We observe a similar pattern for the
+# humidity feature: some of the ICEs lines show a sharp decrease when the humidity is
+# above 80%.
+#
+# Not all ICE lines are parallel, this indicates that the model finds
+# interactions between features. We can repeat the experiment by constraining the
+# gradient boosting model to not use any interactions between features using the
+# parameter `interaction_cst`:
+from sklearn.base import clone
+
+interaction_cst = [[i] for i in range(X_train.shape[1])]
+hgbdt_model_without_interactions = (
+    clone(hgbdt_model)
+    .set_params(histgradientboostingregressor__interaction_cst=interaction_cst)
+    .fit(X_train, y_train)
 )
-display.figure_.subplots_adjust(wspace=0.4, hspace=0.3)
+print(f"Test R2 score: {hgbdt_model_without_interactions.score(X_test, y_test):.2f}")
 
 # %%
-# The two-way partial dependence plot shows the dependence of median house
-# price on joint values of house age and average occupants per household. We
-# can clearly see an interaction between the two features: for an average
-# occupancy greater than two, the house price is nearly independent of the
-# house age, whereas for values less than two there is a strong dependence on
-# age.
-#
-# Interaction constraints
-# .......................
-#
-# The histogram gradient boosters have an interesting option to constrain
-# possible interactions among features. In the following, we do not allow any
-# interactions and thus render the model as a version of a tree-based boosted
-# generalized additive model (GAM). This makes the model more interpretable
-# as the effect of each feature can be investigated independently of all others.
-#
-# We train the :class:`~sklearn.ensemble.HistGradientBoostingRegressor` again,
-# now with `interaction_cst`, where we  pass for each feature a list containing
-# only its own index, e.g. `[[0], [1], [2], ..]`.
-
-print("Training interaction constraint HistGradientBoostingRegressor...")
+_, ax = plt.subplots(ncols=2, figsize=(6, 4), sharey=True, constrained_layout=True)
+
+features_info["centered"] = False
+display = PartialDependenceDisplay.from_estimator(
+    hgbdt_model_without_interactions,
+    X_train,
+    **features_info,
+    ax=ax,
+    **common_params,
+)
+_ = display.figure_.suptitle("ICE and PDP representations", fontsize=16)
+
+# %%
+# 2D interaction plots
+# --------------------
+#
+# PDPs with two features of interest enable us to visualize interactions among them.
+# However, ICEs cannot be plotted in an easy manner and thus interpreted. We will show
+# the representation of available in
+# :meth:`~sklearn.inspection.PartialDependenceDisplay.from_estimator` that is a 2D
+# heatmap.
+print("Computing partial dependence plots...")
+features_info = {
+    "features": ["temp", "humidity", ("temp", "humidity")],
+    "kind": "average",
+}
+_, ax = plt.subplots(ncols=3, figsize=(10, 4), constrained_layout=True)
 tic = time()
-est_no_interactions = HistGradientBoostingRegressor(interaction_cst="no_interactions")
-est_no_interactions.fit(X_train, y_train)
+display = PartialDependenceDisplay.from_estimator(
+    hgbdt_model,
+    X_train,
+    **features_info,
+    ax=ax,
+    **common_params,
+)
 print(f"done in {time() - tic:.3f}s")
+_ = display.figure_.suptitle(
+    "1-way vs 2-way of numerical PDP using gradient boosting", fontsize=16
+)
 
 # %%
-# The easiest way to show the effect of forbidden interactions is again the
-# ICE plots.
-
+# The two-way partial dependence plot shows the dependence of the number of bike rentals
+# on joint values of temperature and humidity.
+# We clearly see an interaction between the two features. For a temperature higher than
+# 20 degrees Celsius, the humidity has a impact on the number of bike rentals
+# that seems independent on the temperature.
+#
+# On the other hand, for temperatures lower than 20 degrees Celsius, both the
+# temperature and humidity continuously impact the number of bike rentals.
+#
+# Furthermore, the slope of the of the impact ridge of the 20 degrees Celsius
+# threshold is very dependent on the humidity level: the ridge is steep under
+# dry conditions but much smoother under wetter conditions above 70% of humidity.
+#
+# We now contrast those results with the same plots computed for the model
+# constrained to learn a prediction function that does not depend on such
+# non-linear feature interactions.
 print("Computing partial dependence plots...")
+features_info = {
+    "features": ["temp", "humidity", ("temp", "humidity")],
+    "kind": "average",
+}
+_, ax = plt.subplots(ncols=3, figsize=(10, 4), constrained_layout=True)
 tic = time()
 display = PartialDependenceDisplay.from_estimator(
-    est_no_interactions,
+    hgbdt_model_without_interactions,
     X_train,
-    ["MedInc", "AveOccup", "HouseAge", "AveRooms"],
-    kind="both",
-    subsample=50,
-    n_jobs=3,
-    grid_resolution=20,
-    random_state=0,
-    ice_lines_kw={"color": "tab:blue", "alpha": 0.2, "linewidth": 0.5},
-    pd_line_kw={"color": "tab:orange", "linestyle": "--"},
+    **features_info,
+    ax=ax,
+    **common_params,
 )
-
 print(f"done in {time() - tic:.3f}s")
-display.figure_.suptitle(
-    "Partial dependence of house value with Gradient Boosting\n"
-    "and no interactions allowed"
+_ = display.figure_.suptitle(
+    "1-way vs 2-way of numerical PDP using gradient boosting", fontsize=16
 )
-display.figure_.subplots_adjust(wspace=0.4, hspace=0.3)
 
 # %%
-# All 4 plots have parallel ICE lines meaning there is no interaction in the
+# The 1D partial dependence plots for the model constrained to not model feature
+# interactions show local spikes for each features individually, in particular for
+# for the "humidity" feature. Those spikes might be reflecting a degraded behavior
+# of the model that attempts to somehow compensate for the forbidden interactions
+# by overfitting particular training points. Note that the predictive performance
+# of this model as measured on the test set is significantly worse than that of
+# the original, unconstrained model.
+#
+# Also note that the number of local spikes visible on those plots is depends on
+# the grid resolution parameter of the PD plot itself.
+#
+# Those local spikes result in a noisily gridded 2D PD plot. It is quite
+# challenging to tell whether or not there are no interaction between those
+# features because of the high frequency oscillations in the humidity feature.
+# However it can clearly be seen that the simple interaction effect observed when
+# the temperature crosses the 20 degrees boundary is no longer visible for this
 # model.
-# Let us also have a look at the corresponding 2D-plot.
-
+#
+# The partial dependence between categorical features will provide a discrete
+# representation that can be shown as a heatmap. For instance the interaction between
+# the season, the weather, and the target would be as follow:
 print("Computing partial dependence plots...")
+features_info = {
+    "features": ["season", "weather", ("season", "weather")],
+    "kind": "average",
+    "categorical_features": categorical_features,
+}
+_, ax = plt.subplots(ncols=3, figsize=(14, 6), constrained_layout=True)
 tic = time()
-_, ax = plt.subplots(ncols=3, figsize=(9, 4))
 display = PartialDependenceDisplay.from_estimator(
-    est_no_interactions,
+    hgbdt_model,
     X_train,
-    ["AveOccup", "HouseAge", ("AveOccup", "HouseAge")],
-    kind="average",
-    n_jobs=3,
-    grid_resolution=20,
+    **features_info,
     ax=ax,
+    **common_params,
 )
+
 print(f"done in {time() - tic:.3f}s")
-display.figure_.suptitle(
-    "Partial dependence of house value with Gradient Boosting\n"
-    "and no interactions allowed"
+_ = display.figure_.suptitle(
+    "1-way vs 2-way PDP of categorical features using gradient boosting", fontsize=16
 )
-display.figure_.subplots_adjust(wspace=0.4, hspace=0.3)
 
 # %%
-# Although the 2D-plot shows much less interaction compared with the 2D-plot
-# from above, it is much harder to come to the conclusion that there is no
-# interaction at all. This might be a cause of the discrete predictions of
-# trees in combination with numerically precision of partial dependence.
-# We also observe that the univariate dependence plots have slightly changed
-# as the model tries to compensate for the forbidden interactions.
-#
-# 3D interaction plots
-# --------------------
+# 3D representation
+# """""""""""""""""
 #
 # Let's make the same partial dependence plot for the 2 features interaction,
 # this time in 3 dimensions.
@@ -344,11 +526,11 @@
 
 from sklearn.inspection import partial_dependence
 
-fig = plt.figure()
+fig = plt.figure(figsize=(5.5, 5))
 
-features = ("AveOccup", "HouseAge")
+features = ("temp", "humidity")
 pdp = partial_dependence(
-    est, X_train, features=features, kind="average", grid_resolution=10
+    hgbdt_model, X_train, features=features, kind="average", grid_resolution=10
 )
 XX, YY = np.meshgrid(pdp["values"][0], pdp["values"][1])
 Z = pdp.average[0].T
@@ -358,13 +540,14 @@
 surf = ax.plot_surface(XX, YY, Z, rstride=1, cstride=1, cmap=plt.cm.BuPu, edgecolor="k")
 ax.set_xlabel(features[0])
 ax.set_ylabel(features[1])
-ax.set_zlabel("Partial dependence")
+fig.suptitle(
+    "PD of number of bike rentals on\nthe temperature and humidity GBDT model",
+    fontsize=16,
+)
 # pretty init view
 ax.view_init(elev=22, azim=122)
-plt.colorbar(surf)
-plt.suptitle(
-    "Partial dependence of house value on median\n"
-    "age and average occupancy, with Gradient Boosting"
-)
-plt.subplots_adjust(top=0.9)
+clb = plt.colorbar(surf, pad=0.08, shrink=0.6, aspect=10)
+clb.ax.set_title("Partial\ndependence")
 plt.show()
+
+# %%
diff --git a/sklearn/inspection/_partial_dependence.py b/sklearn/inspection/_partial_dependence.py
index 9bc70a50ce197..75a88508e3626 100644
--- a/sklearn/inspection/_partial_dependence.py
+++ b/sklearn/inspection/_partial_dependence.py
@@ -11,6 +11,7 @@
 from scipy import sparse
 from scipy.stats.mstats import mquantiles
 
+from ._pd_utils import _check_feature_names, _get_feature_index
 from ..base import is_classifier, is_regressor
 from ..utils.extmath import cartesian
 from ..utils import check_array
@@ -35,31 +36,38 @@
 ]
 
 
-def _grid_from_X(X, percentiles, grid_resolution):
+def _grid_from_X(X, percentiles, is_categorical, grid_resolution):
     """Generate a grid of points based on the percentiles of X.
 
     The grid is a cartesian product between the columns of ``values``. The
     ith column of ``values`` consists in ``grid_resolution`` equally-spaced
     points between the percentiles of the jth column of X.
+
     If ``grid_resolution`` is bigger than the number of unique values in the
-    jth column of X, then those unique values will be used instead.
+    j-th column of X or if the feature is a categorical feature (by inspecting
+    `is_categorical`) , then those unique values will be used instead.
 
     Parameters
     ----------
-    X : ndarray, shape (n_samples, n_target_features)
+    X : array-like of shape (n_samples, n_target_features)
         The data.
 
-    percentiles : tuple of floats
+    percentiles : tuple of float
         The percentiles which are used to construct the extreme values of
         the grid. Must be in [0, 1].
 
+    is_categorical : list of bool
+        For each feature, tells whether it is categorical or not. If a feature
+        is categorical, then the values used will be the unique ones
+        (i.e. categories) instead of the percentiles.
+
     grid_resolution : int
         The number of equally spaced points to be placed on the grid for each
         feature.
 
     Returns
     -------
-    grid : ndarray, shape (n_points, n_target_features)
+    grid : ndarray of shape (n_points, n_target_features)
         A value for each feature at each point in the grid. ``n_points`` is
         always ``<= grid_resolution ** X.shape[1]``.
 
@@ -79,10 +87,12 @@ def _grid_from_X(X, percentiles, grid_resolution):
         raise ValueError("'grid_resolution' must be strictly greater than 1.")
 
     values = []
-    for feature in range(X.shape[1]):
+    for feature, is_cat in enumerate(is_categorical):
         uniques = np.unique(_safe_indexing(X, feature, axis=1))
-        if uniques.shape[0] < grid_resolution:
-            # feature has low resolution use unique vals
+        if is_cat or uniques.shape[0] < grid_resolution:
+            # Use the unique values either because:
+            # - feature has low resolution use unique values
+            # - feature is categorical
             axis = uniques
         else:
             # create axis based on percentiles and grid resolution
@@ -207,6 +217,8 @@ def partial_dependence(
     X,
     features,
     *,
+    categorical_features=None,
+    feature_names=None,
     response_method="auto",
     percentiles=(0.05, 0.95),
     grid_resolution=100,
@@ -255,6 +267,27 @@ def partial_dependence(
         The feature (e.g. `[0]`) or pair of interacting features
         (e.g. `[(0, 1)]`) for which the partial dependency should be computed.
 
+    categorical_features : array-like of shape (n_features,) or shape \
+            (n_categorical_features,), dtype={bool, int, str}, default=None
+        Indicates the categorical features.
+
+        - `None`: no feature will be considered categorical;
+        - boolean array-like: boolean mask of shape `(n_features,)`
+            indicating which features are categorical. Thus, this array has
+            the same shape has `X.shape[1]`;
+        - integer or string array-like: integer indices or strings
+            indicating categorical features.
+
+        .. versionadded:: 1.2
+
+    feature_names : array-like of shape (n_features,), dtype=str, default=None
+        Name of each feature; `feature_names[i]` holds the name of the feature
+        with index `i`.
+        By default, the name of the feature corresponds to their numerical
+        index for NumPy array and their column name for pandas dataframe.
+
+        .. versionadded:: 1.2
+
     response_method : {'auto', 'predict_proba', 'decision_function'}, \
             default='auto'
         Specifies whether to use :term:`predict_proba` or
@@ -304,11 +337,11 @@ def partial_dependence(
 
     kind : {'average', 'individual', 'both'}, default='average'
         Whether to return the partial dependence averaged across all the
-        samples in the dataset or one line per sample or both.
+        samples in the dataset or one value per sample or both.
         See Returns below.
 
         Note that the fast `method='recursion'` option is only available for
-        `kind='average'`. Plotting individual dependencies requires using the
+        `kind='average'`. Computing individual dependencies requires using the
         slower `method='brute'` option.
 
         .. versionadded:: 0.24
@@ -455,8 +488,43 @@ def partial_dependence(
         _get_column_indices(X, features), dtype=np.int32, order="C"
     ).ravel()
 
+    feature_names = _check_feature_names(X, feature_names)
+
+    n_features = X.shape[1]
+    if categorical_features is None:
+        is_categorical = [False] * len(features_indices)
+    else:
+        categorical_features = np.array(categorical_features, copy=False)
+        if categorical_features.dtype.kind == "b":
+            # categorical features provided as a list of boolean
+            if categorical_features.size != n_features:
+                raise ValueError(
+                    "When `categorical_features` is a boolean array-like, "
+                    "the array should be of shape (n_features,). Got "
+                    f"{categorical_features.size} elements while `X` contains "
+                    f"{n_features} features."
+                )
+            is_categorical = [categorical_features[idx] for idx in features_indices]
+        elif categorical_features.dtype.kind in ("i", "O", "U"):
+            # categorical features provided as a list of indices or feature names
+            categorical_features_idx = [
+                _get_feature_index(cat, feature_names=feature_names)
+                for cat in categorical_features
+            ]
+            is_categorical = [
+                idx in categorical_features_idx for idx in features_indices
+            ]
+        else:
+            raise ValueError(
+                "Expected `categorical_features` to be an array-like of boolean,"
+                f" integer, or string. Got {categorical_features.dtype} instead."
+            )
+
     grid, values = _grid_from_X(
-        _safe_indexing(X, features_indices, axis=1), percentiles, grid_resolution
+        _safe_indexing(X, features_indices, axis=1),
+        percentiles,
+        is_categorical,
+        grid_resolution,
     )
 
     if method == "brute":
diff --git a/sklearn/inspection/_pd_utils.py b/sklearn/inspection/_pd_utils.py
new file mode 100644
index 0000000000000..76f4d626fd53c
--- /dev/null
+++ b/sklearn/inspection/_pd_utils.py
@@ -0,0 +1,64 @@
+def _check_feature_names(X, feature_names=None):
+    """Check feature names.
+
+    Parameters
+    ----------
+    X : array-like of shape (n_samples, n_features)
+        Input data.
+
+    feature_names : None or array-like of shape (n_names,), dtype=str
+        Feature names to check or `None`.
+
+    Returns
+    -------
+    feature_names : list of str
+        Feature names validated. If `feature_names` is `None`, then a list of
+        feature names is provided, i.e. the column names of a pandas dataframe
+        or a generic list of feature names (e.g. `["x0", "x1", ...]`) for a
+        NumPy array.
+    """
+    if feature_names is None:
+        if hasattr(X, "columns") and hasattr(X.columns, "tolist"):
+            # get the column names for a pandas dataframe
+            feature_names = X.columns.tolist()
+        else:
+            # define a list of numbered indices for a numpy array
+            feature_names = [f"x{i}" for i in range(X.shape[1])]
+    elif hasattr(feature_names, "tolist"):
+        # convert numpy array or pandas index to a list
+        feature_names = feature_names.tolist()
+    if len(set(feature_names)) != len(feature_names):
+        raise ValueError("feature_names should not contain duplicates.")
+
+    return feature_names
+
+
+def _get_feature_index(fx, feature_names=None):
+    """Get feature index.
+
+    Parameters
+    ----------
+    fx : int or str
+        Feature index or name.
+
+    feature_names : list of str, default=None
+        All feature names from which to search the indices.
+
+    Returns
+    -------
+    idx : int
+        Feature index.
+    """
+    if isinstance(fx, str):
+        if feature_names is None:
+            raise ValueError(
+                f"Cannot plot partial dependence for feature {fx!r} since "
+                "the list of feature names was not provided, neither as "
+                "column names of a pandas data-frame nor via the feature_names "
+                "parameter."
+            )
+        try:
+            return feature_names.index(fx)
+        except ValueError as e:
+            raise ValueError(f"Feature {fx!r} not in feature_names") from e
+    return fx
diff --git a/sklearn/inspection/_plot/partial_dependence.py b/sklearn/inspection/_plot/partial_dependence.py
index 9d09cda0989bd..e3392eeb911b1 100644
--- a/sklearn/inspection/_plot/partial_dependence.py
+++ b/sklearn/inspection/_plot/partial_dependence.py
@@ -9,6 +9,7 @@
 from joblib import Parallel
 
 from .. import partial_dependence
+from .._pd_utils import _check_feature_names, _get_feature_index
 from ...base import is_regressor
 from ...utils import Bunch
 from ...utils import check_array
@@ -16,6 +17,7 @@
 from ...utils import check_random_state
 from ...utils import _safe_indexing
 from ...utils.fixes import delayed
+from ...utils._encode import _unique
 
 
 class PartialDependenceDisplay:
@@ -124,6 +126,13 @@ class PartialDependenceDisplay:
 
         .. versionadded:: 0.24
 
+    is_categorical : list of (bool,) or list of (bool, bool), default=None
+        Whether each target feature in `features` is categorical or not.
+        The list should be same size as `features`. If `None`, all features
+        are assumed to be continuous.
+
+        .. versionadded:: 1.2
+
     Attributes
     ----------
     bounding_ax_ : matplotlib Axes or None
@@ -169,6 +178,26 @@ class PartialDependenceDisplay:
         item in `ax`. Elements that are None correspond to a nonexisting axes
         or an axes that does not include a contour plot.
 
+    bars_ : ndarray of matplotlib Artists
+        If `ax` is an axes or None, `bars_[i, j]` is the partial dependence bar
+        plot on the i-th row and j-th column (for a categorical feature).
+        If `ax` is a list of axes, `bars_[i]` is the partial dependence bar
+        plot corresponding to the i-th item in `ax`. Elements that are None
+        correspond to a nonexisting axes or an axes that does not include a
+        bar plot.
+
+        .. versionadded:: 1.2
+
+    heatmaps_ : ndarray of matplotlib Artists
+        If `ax` is an axes or None, `heatmaps_[i, j]` is the partial dependence
+        heatmap on the i-th row and j-th column (for a pair of categorical
+        features) . If `ax` is a list of axes, `heatmaps_[i]` is the partial
+        dependence heatmap corresponding to the i-th item in `ax`. Elements
+        that are None correspond to a nonexisting axes or an axes that does not
+        include a heatmap.
+
+        .. versionadded:: 1.2
+
     figure_ : matplotlib Figure
         Figure containing partial dependence plots.
 
@@ -212,6 +241,7 @@ def __init__(
         kind="average",
         subsample=1000,
         random_state=None,
+        is_categorical=None,
     ):
         self.pd_results = pd_results
         self.features = features
@@ -222,6 +252,7 @@ def __init__(
         self.kind = kind
         self.subsample = subsample
         self.random_state = random_state
+        self.is_categorical = is_categorical
 
     @classmethod
     def from_estimator(
@@ -230,6 +261,7 @@ def from_estimator(
         X,
         features,
         *,
+        categorical_features=None,
         feature_names=None,
         target=None,
         response_method="auto",
@@ -317,7 +349,20 @@ def from_estimator(
             If `features[i]` is an integer or a string, a one-way PDP is created;
             if `features[i]` is a tuple, a two-way PDP is created (only supported
             with `kind='average'`). Each tuple must be of size 2.
-            if any entry is a string, then it must be in ``feature_names``.
+            If any entry is a string, then it must be in ``feature_names``.
+
+        categorical_features : array-like of shape (n_features,) or shape \
+                (n_categorical_features,), dtype={bool, int, str}, default=None
+            Indicates the categorical features.
+
+            - `None`: no feature will be considered categorical;
+            - boolean array-like: boolean mask of shape `(n_features,)`
+              indicating which features are categorical. Thus, this array has
+              the same shape has `X.shape[1]`;
+            - integer or string array-like: integer indices or strings
+              indicating categorical features.
+
+            .. versionadded:: 1.2
 
         feature_names : array-like of shape (n_features,), dtype=str, default=None
             Name of each feature; `feature_names[i]` holds the name of the feature
@@ -500,20 +545,7 @@ def from_estimator(
             X = check_array(X, force_all_finite="allow-nan", dtype=object)
         n_features = X.shape[1]
 
-        # convert feature_names to list
-        if feature_names is None:
-            if hasattr(X, "loc"):
-                # get the column names for a pandas dataframe
-                feature_names = X.columns.tolist()
-            else:
-                # define a list of numbered indices for a numpy array
-                feature_names = [str(i) for i in range(n_features)]
-        elif hasattr(feature_names, "tolist"):
-            # convert numpy array or pandas index to a list
-            feature_names = feature_names.tolist()
-        if len(set(feature_names)) != len(feature_names):
-            raise ValueError("feature_names should not contain duplicates.")
-
+        feature_names = _check_feature_names(X, feature_names)
         # expand kind to always be a list of str
         kind_ = [kind] * len(features) if isinstance(kind, str) else kind
         if len(kind_) != len(features):
@@ -523,21 +555,15 @@ def from_estimator(
                 f"element(s) and `features` contains {len(features)} element(s)."
             )
 
-        def convert_feature(fx):
-            if isinstance(fx, str):
-                try:
-                    fx = feature_names.index(fx)
-                except ValueError as e:
-                    raise ValueError("Feature %s not in feature_names" % fx) from e
-            return int(fx)
-
         # convert features into a seq of int tuples
         tmp_features, ice_for_two_way_pd = [], []
         for kind_plot, fxs in zip(kind_, features):
             if isinstance(fxs, (numbers.Integral, str)):
                 fxs = (fxs,)
             try:
-                fxs = tuple(convert_feature(fx) for fx in fxs)
+                fxs = tuple(
+                    _get_feature_index(fx, feature_names=feature_names) for fx in fxs
+                )
             except TypeError as e:
                 raise ValueError(
                     "Each entry in features must be either an int, "
@@ -556,7 +582,7 @@ def convert_feature(fx):
             tmp_features.append(fxs)
 
         if any(ice_for_two_way_pd):
-            # raise an error an be specific regarding the parameter values
+            # raise an error and be specific regarding the parameter values
             # when 1- and 2-way PD were requested
             kind_ = [
                 "average" if forcing_average else kind_plot
@@ -571,6 +597,81 @@ def convert_feature(fx):
             )
         features = tmp_features
 
+        if categorical_features is None:
+            is_categorical = [
+                (False,) if len(fxs) == 1 else (False, False) for fxs in features
+            ]
+        else:
+            # we need to create a boolean indicator of which features are
+            # categorical from the categorical_features list.
+            categorical_features = np.array(categorical_features, copy=False)
+            if categorical_features.dtype.kind == "b":
+                # categorical features provided as a list of boolean
+                if categorical_features.size != n_features:
+                    raise ValueError(
+                        "When `categorical_features` is a boolean array-like, "
+                        "the array should be of shape (n_features,). Got "
+                        f"{categorical_features.size} elements while `X` contains "
+                        f"{n_features} features."
+                    )
+                is_categorical = [
+                    tuple(categorical_features[fx] for fx in fxs) for fxs in features
+                ]
+            elif categorical_features.dtype.kind in ("i", "O", "U"):
+                # categorical features provided as a list of indices or feature names
+                categorical_features_idx = [
+                    _get_feature_index(cat, feature_names=feature_names)
+                    for cat in categorical_features
+                ]
+                is_categorical = [
+                    tuple([idx in categorical_features_idx for idx in fxs])
+                    for fxs in features
+                ]
+            else:
+                raise ValueError(
+                    "Expected `categorical_features` to be an array-like of boolean,"
+                    f" integer, or string. Got {categorical_features.dtype} instead."
+                )
+
+            for cats in is_categorical:
+                if np.size(cats) == 2 and (cats[0] != cats[1]):
+                    raise ValueError(
+                        "Two-way partial dependence plots are not supported for pairs"
+                        " of continuous and categorical features."
+                    )
+
+            # collect the indices of the categorical features targeted by the partial
+            # dependence computation
+            categorical_features_targeted = set(
+                [
+                    fx
+                    for fxs, cats in zip(features, is_categorical)
+                    for fx in fxs
+                    if any(cats)
+                ]
+            )
+            if categorical_features_targeted:
+                min_n_cats = min(
+                    [
+                        len(_unique(_safe_indexing(X, idx, axis=1)))
+                        for idx in categorical_features_targeted
+                    ]
+                )
+                if grid_resolution < min_n_cats:
+                    raise ValueError(
+                        "The resolution of the computed grid is less than the "
+                        "minimum number of categories in the targeted categorical "
+                        "features. Expect the `grid_resolution` to be greater than "
+                        f"{min_n_cats}. Got {grid_resolution} instead."
+                    )
+
+            for is_cat, kind_plot in zip(is_categorical, kind_):
+                if any(is_cat) and kind_plot != "average":
+                    raise ValueError(
+                        "It is not possible to display individual effects for"
+                        " categorical features."
+                    )
+
         # Early exit if the axes does not have the correct number of axes
         if ax is not None and not isinstance(ax, plt.Axes):
             axes = np.asarray(ax, dtype=object)
@@ -606,6 +707,8 @@ def convert_feature(fx):
                 estimator,
                 X,
                 fxs,
+                feature_names=feature_names,
+                categorical_features=categorical_features,
                 response_method=response_method,
                 method=method,
                 grid_resolution=grid_resolution,
@@ -636,10 +739,11 @@ def convert_feature(fx):
             target_idx = target
 
         deciles = {}
-        for fx in chain.from_iterable(features):
-            if fx not in deciles:
-                X_col = _safe_indexing(X, fx, axis=1)
-                deciles[fx] = mquantiles(X_col, prob=np.arange(0.1, 1.0, 0.1))
+        for fxs, cats in zip(features, is_categorical):
+            for fx, cat in zip(fxs, cats):
+                if not cat and fx not in deciles:
+                    X_col = _safe_indexing(X, fx, axis=1)
+                    deciles[fx] = mquantiles(X_col, prob=np.arange(0.1, 1.0, 0.1))
 
         display = PartialDependenceDisplay(
             pd_results=pd_results,
@@ -650,6 +754,7 @@ def convert_feature(fx):
             kind=kind,
             subsample=subsample,
             random_state=random_state,
+            is_categorical=is_categorical,
         )
         return display.plot(
             ax=ax,
@@ -727,6 +832,8 @@ def _plot_average_dependence(
         ax,
         pd_line_idx,
         line_kw,
+        categorical,
+        bar_kw,
     ):
         """Plot the average partial dependence.
 
@@ -744,15 +851,22 @@ def _plot_average_dependence(
             matching 2D position in the grid layout.
         line_kw : dict
             Dict with keywords passed when plotting the PD plot.
-        centered : bool
-            Whether or not to center the average PD to start at the origin.
+        categorical : bool
+            Whether feature is categorical.
+        bar_kw: dict
+            Dict with keywords passed when plotting the PD bars (categorical).
         """
-        line_idx = np.unravel_index(pd_line_idx, self.lines_.shape)
-        self.lines_[line_idx] = ax.plot(
-            feature_values,
-            avg_preds,
-            **line_kw,
-        )[0]
+        if categorical:
+            bar_idx = np.unravel_index(pd_line_idx, self.bars_.shape)
+            self.bars_[bar_idx] = ax.bar(feature_values, avg_preds, **bar_kw)[0]
+            ax.tick_params(axis="x", rotation=90)
+        else:
+            line_idx = np.unravel_index(pd_line_idx, self.lines_.shape)
+            self.lines_[line_idx] = ax.plot(
+                feature_values,
+                avg_preds,
+                **line_kw,
+            )[0]
 
     def _plot_one_way_partial_dependence(
         self,
@@ -768,6 +882,8 @@ def _plot_one_way_partial_dependence(
         n_lines,
         ice_lines_kw,
         pd_line_kw,
+        categorical,
+        bar_kw,
         pdp_lim,
     ):
         """Plot 1-way partial dependence: ICE and PDP.
@@ -802,6 +918,10 @@ def _plot_one_way_partial_dependence(
             Dict with keywords passed when plotting the ICE lines.
         pd_line_kw : dict
             Dict with keywords passed when plotting the PD plot.
+        categorical : bool
+            Whether feature is categorical.
+        bar_kw: dict
+            Dict with keywords passed when plotting the PD bars (categorical).
         pdp_lim : dict
             Global min and max average predictions, such that all plots will
             have the same scale and y limits. `pdp_lim[1]` is the global min
@@ -832,20 +952,25 @@ def _plot_one_way_partial_dependence(
                 ax,
                 pd_line_idx,
                 pd_line_kw,
+                categorical,
+                bar_kw,
             )
 
         trans = transforms.blended_transform_factory(ax.transData, ax.transAxes)
         # create the decile line for the vertical axis
         vlines_idx = np.unravel_index(pd_plot_idx, self.deciles_vlines_.shape)
-        self.deciles_vlines_[vlines_idx] = ax.vlines(
-            self.deciles[feature_idx[0]],
-            0,
-            0.05,
-            transform=trans,
-            color="k",
-        )
+        if self.deciles.get(feature_idx[0], None) is not None:
+            self.deciles_vlines_[vlines_idx] = ax.vlines(
+                self.deciles[feature_idx[0]],
+                0,
+                0.05,
+                transform=trans,
+                color="k",
+            )
         # reset ylim which was overwritten by vlines
-        ax.set_ylim(pdp_lim[1])
+        min_val = min(val[0] for val in pdp_lim.values())
+        max_val = max(val[1] for val in pdp_lim.values())
+        ax.set_ylim([min_val, max_val])
 
         # Set xlabel if it is not already set
         if not ax.get_xlabel():
@@ -857,7 +982,7 @@ def _plot_one_way_partial_dependence(
         else:
             ax.set_yticklabels([])
 
-        if pd_line_kw.get("label", None) and kind != "individual":
+        if pd_line_kw.get("label", None) and kind != "individual" and not categorical:
             ax.legend()
 
     def _plot_two_way_partial_dependence(
@@ -869,6 +994,8 @@ def _plot_two_way_partial_dependence(
         pd_plot_idx,
         Z_level,
         contour_kw,
+        categorical,
+        heatmap_kw,
     ):
         """Plot 2-way partial dependence.
 
@@ -892,52 +1019,99 @@ def _plot_two_way_partial_dependence(
             The Z-level used to encode the average predictions.
         contour_kw : dict
             Dict with keywords passed when plotting the contours.
+        categorical : bool
+            Whether features are categorical.
+        heatmap_kw: dict
+            Dict with keywords passed when plotting the PD heatmap
+            (categorical).
         """
-        from matplotlib import transforms  # noqa
+        if categorical:
+            import matplotlib.pyplot as plt
+
+            default_im_kw = dict(interpolation="nearest", cmap="viridis")
+            im_kw = {**default_im_kw, **heatmap_kw}
+
+            data = avg_preds[self.target_idx]
+            im = ax.imshow(data, **im_kw)
+            text = None
+            cmap_min, cmap_max = im.cmap(0), im.cmap(1.0)
+
+            text = np.empty_like(data, dtype=object)
+            # print text with appropriate color depending on background
+            thresh = (data.max() + data.min()) / 2.0
+
+            for flat_index in range(data.size):
+                row, col = np.unravel_index(flat_index, data.shape)
+                color = cmap_max if data[row, col] < thresh else cmap_min
+
+                values_format = ".2f"
+                text_data = format(data[row, col], values_format)
+
+                text_kwargs = dict(ha="center", va="center", color=color)
+                text[row, col] = ax.text(col, row, text_data, **text_kwargs)
+
+            fig = ax.figure
+            fig.colorbar(im, ax=ax)
+            ax.set(
+                xticks=np.arange(len(feature_values[1])),
+                yticks=np.arange(len(feature_values[0])),
+                xticklabels=feature_values[1],
+                yticklabels=feature_values[0],
+                xlabel=self.feature_names[feature_idx[1]],
+                ylabel=self.feature_names[feature_idx[0]],
+            )
 
-        XX, YY = np.meshgrid(feature_values[0], feature_values[1])
-        Z = avg_preds[self.target_idx].T
-        CS = ax.contour(XX, YY, Z, levels=Z_level, linewidths=0.5, colors="k")
-        contour_idx = np.unravel_index(pd_plot_idx, self.contours_.shape)
-        self.contours_[contour_idx] = ax.contourf(
-            XX,
-            YY,
-            Z,
-            levels=Z_level,
-            vmax=Z_level[-1],
-            vmin=Z_level[0],
-            **contour_kw,
-        )
-        ax.clabel(CS, fmt="%2.2f", colors="k", fontsize=10, inline=True)
+            plt.setp(ax.get_xticklabels(), rotation="vertical")
 
-        trans = transforms.blended_transform_factory(ax.transData, ax.transAxes)
-        # create the decile line for the vertical axis
-        xlim, ylim = ax.get_xlim(), ax.get_ylim()
-        vlines_idx = np.unravel_index(pd_plot_idx, self.deciles_vlines_.shape)
-        self.deciles_vlines_[vlines_idx] = ax.vlines(
-            self.deciles[feature_idx[0]],
-            0,
-            0.05,
-            transform=trans,
-            color="k",
-        )
-        # create the decile line for the horizontal axis
-        hlines_idx = np.unravel_index(pd_plot_idx, self.deciles_hlines_.shape)
-        self.deciles_hlines_[hlines_idx] = ax.hlines(
-            self.deciles[feature_idx[1]],
-            0,
-            0.05,
-            transform=trans,
-            color="k",
-        )
-        # reset xlim and ylim since they are overwritten by hlines and vlines
-        ax.set_xlim(xlim)
-        ax.set_ylim(ylim)
+            heatmap_idx = np.unravel_index(pd_plot_idx, self.heatmaps_.shape)
+            self.heatmaps_[heatmap_idx] = im
+        else:
+            from matplotlib import transforms  # noqa
+
+            XX, YY = np.meshgrid(feature_values[0], feature_values[1])
+            Z = avg_preds[self.target_idx].T
+            CS = ax.contour(XX, YY, Z, levels=Z_level, linewidths=0.5, colors="k")
+            contour_idx = np.unravel_index(pd_plot_idx, self.contours_.shape)
+            self.contours_[contour_idx] = ax.contourf(
+                XX,
+                YY,
+                Z,
+                levels=Z_level,
+                vmax=Z_level[-1],
+                vmin=Z_level[0],
+                **contour_kw,
+            )
+            ax.clabel(CS, fmt="%2.2f", colors="k", fontsize=10, inline=True)
+
+            trans = transforms.blended_transform_factory(ax.transData, ax.transAxes)
+            # create the decile line for the vertical axis
+            xlim, ylim = ax.get_xlim(), ax.get_ylim()
+            vlines_idx = np.unravel_index(pd_plot_idx, self.deciles_vlines_.shape)
+            self.deciles_vlines_[vlines_idx] = ax.vlines(
+                self.deciles[feature_idx[0]],
+                0,
+                0.05,
+                transform=trans,
+                color="k",
+            )
+            # create the decile line for the horizontal axis
+            hlines_idx = np.unravel_index(pd_plot_idx, self.deciles_hlines_.shape)
+            self.deciles_hlines_[hlines_idx] = ax.hlines(
+                self.deciles[feature_idx[1]],
+                0,
+                0.05,
+                transform=trans,
+                color="k",
+            )
+            # reset xlim and ylim since they are overwritten by hlines and
+            # vlines
+            ax.set_xlim(xlim)
+            ax.set_ylim(ylim)
 
-        # set xlabel if it is not already set
-        if not ax.get_xlabel():
-            ax.set_xlabel(self.feature_names[feature_idx[0]])
-        ax.set_ylabel(self.feature_names[feature_idx[1]])
+            # set xlabel if it is not already set
+            if not ax.get_xlabel():
+                ax.set_xlabel(self.feature_names[feature_idx[0]])
+            ax.set_ylabel(self.feature_names[feature_idx[1]])
 
     def plot(
         self,
@@ -948,6 +1122,8 @@ def plot(
         ice_lines_kw=None,
         pd_line_kw=None,
         contour_kw=None,
+        bar_kw=None,
+        heatmap_kw=None,
         pdp_lim=None,
         centered=False,
     ):
@@ -993,6 +1169,18 @@ def plot(
             Dict with keywords passed to the `matplotlib.pyplot.contourf`
             call for two-way partial dependence plots.
 
+        bar_kw : dict, default=None
+            Dict with keywords passed to the `matplotlib.pyplot.bar`
+            call for one-way categorical partial dependence plots.
+
+            .. versionadded:: 1.2
+
+        heatmap_kw : dict, default=None
+            Dict with keywords passed to the `matplotlib.pyplot.imshow`
+            call for two-way categorical partial dependence plots.
+
+            .. versionadded:: 1.2
+
         pdp_lim : dict, default=None
             Global min and max average predictions, such that all plots will have the
             same scale and y limits. `pdp_lim[1]` is the global min and max for single
@@ -1024,6 +1212,13 @@ def plot(
         else:
             kind = self.kind
 
+        if self.is_categorical is None:
+            is_categorical = [
+                (False,) if len(fx) == 1 else (False, False) for fx in self.features
+            ]
+        else:
+            is_categorical = self.is_categorical
+
         if len(kind) != len(self.features):
             raise ValueError(
                 "When `kind` is provided as a list of strings, it should "
@@ -1084,6 +1279,13 @@ def plot(
                 preds = pdp.average if kind_plot == "average" else pdp.individual
                 min_pd = preds[self.target_idx].min()
                 max_pd = preds[self.target_idx].max()
+
+                # expand the limits to account so that the plotted lines do not touch
+                # the edges of the plot
+                span = max_pd - min_pd
+                min_pd -= 0.05 * span
+                max_pd += 0.05 * span
+
                 n_fx = len(values)
                 old_min_pd, old_max_pd = pdp_lim.get(n_fx, (min_pd, max_pd))
                 min_pd = min(min_pd, old_min_pd)
@@ -1096,6 +1298,10 @@ def plot(
             ice_lines_kw = {}
         if pd_line_kw is None:
             pd_line_kw = {}
+        if bar_kw is None:
+            bar_kw = {}
+        if heatmap_kw is None:
+            heatmap_kw = {}
 
         if ax is None:
             _, ax = plt.subplots()
@@ -1145,6 +1351,8 @@ def plot(
             else:
                 self.lines_ = np.empty((n_rows, n_cols, n_lines), dtype=object)
             self.contours_ = np.empty((n_rows, n_cols), dtype=object)
+            self.bars_ = np.empty((n_rows, n_cols), dtype=object)
+            self.heatmaps_ = np.empty((n_rows, n_cols), dtype=object)
 
             axes_ravel = self.axes_.ravel()
 
@@ -1174,6 +1382,8 @@ def plot(
             else:
                 self.lines_ = np.empty(ax.shape + (n_lines,), dtype=object)
             self.contours_ = np.empty_like(ax, dtype=object)
+            self.bars_ = np.empty_like(ax, dtype=object)
+            self.heatmaps_ = np.empty_like(ax, dtype=object)
 
         # create contour levels for two-way plots
         if 2 in pdp_lim:
@@ -1182,8 +1392,14 @@ def plot(
         self.deciles_vlines_ = np.empty_like(self.axes_, dtype=object)
         self.deciles_hlines_ = np.empty_like(self.axes_, dtype=object)
 
-        for pd_plot_idx, (axi, feature_idx, pd_result, kind_plot) in enumerate(
-            zip(self.axes_.ravel(), self.features, pd_results_, kind)
+        for pd_plot_idx, (axi, feature_idx, cat, pd_result, kind_plot) in enumerate(
+            zip(
+                self.axes_.ravel(),
+                self.features,
+                is_categorical,
+                pd_results_,
+                kind,
+            )
         ):
             avg_preds = None
             preds = None
@@ -1236,6 +1452,12 @@ def plot(
                     **pd_line_kw,
                 }
 
+                default_bar_kws = {"color": "C0"}
+                bar_kw = {**default_bar_kws, **bar_kw}
+
+                default_heatmap_kw = {}
+                heatmap_kw = {**default_heatmap_kw, **heatmap_kw}
+
                 self._plot_one_way_partial_dependence(
                     kind_plot,
                     preds,
@@ -1249,6 +1471,8 @@ def plot(
                     n_lines,
                     ice_lines_kw,
                     pd_line_kw,
+                    cat[0],
+                    bar_kw,
                     pdp_lim,
                 )
             else:
@@ -1260,6 +1484,8 @@ def plot(
                     pd_plot_idx,
                     Z_level,
                     contour_kw,
+                    cat[0] and cat[1],
+                    heatmap_kw,
                 )
 
         return self
diff --git a/sklearn/inspection/_plot/tests/test_plot_partial_dependence.py b/sklearn/inspection/_plot/tests/test_plot_partial_dependence.py
index 4e97ee60a2013..f48c579d04528 100644
--- a/sklearn/inspection/_plot/tests/test_plot_partial_dependence.py
+++ b/sklearn/inspection/_plot/tests/test_plot_partial_dependence.py
@@ -12,6 +12,9 @@
 from sklearn.ensemble import GradientBoostingClassifier
 from sklearn.linear_model import LinearRegression
 from sklearn.utils._testing import _convert_container
+from sklearn.compose import make_column_transformer
+from sklearn.preprocessing import OneHotEncoder
+from sklearn.pipeline import make_pipeline
 
 from sklearn.inspection import PartialDependenceDisplay
 
@@ -500,7 +503,7 @@ def test_plot_partial_dependence_multioutput(pyplot, target):
     for i, pos in enumerate(positions):
         ax = disp.axes_[pos]
         assert ax.get_ylabel() == expected_label[i]
-        assert ax.get_xlabel() == "{}".format(i)
+        assert ax.get_xlabel() == f"x{i}"
 
 
 @pytest.mark.filterwarnings("ignore:A Bunch will be returned")
@@ -544,12 +547,12 @@ def test_plot_partial_dependence_dataframe(pyplot, clf_diabetes, diabetes):
         (
             dummy_classification_data,
             {"features": ["foobar"], "feature_names": None},
-            "Feature foobar not in feature_names",
+            "Feature 'foobar' not in feature_names",
         ),
         (
             dummy_classification_data,
             {"features": ["foobar"], "feature_names": ["abcd", "def"]},
-            "Feature foobar not in feature_names",
+            "Feature 'foobar' not in feature_names",
         ),
         (
             dummy_classification_data,
@@ -591,6 +594,21 @@ def test_plot_partial_dependence_dataframe(pyplot, clf_diabetes, diabetes):
             {"features": [1], "subsample": 1.2},
             r"When a floating-point, subsample=1.2 should be in the \(0, 1\) range",
         ),
+        (
+            dummy_classification_data,
+            {"features": [1, 2], "categorical_features": [1.0, 2.0]},
+            "Expected `categorical_features` to be an array-like of boolean,",
+        ),
+        (
+            dummy_classification_data,
+            {"features": [(1, 2)], "categorical_features": [2]},
+            "Two-way partial dependence plots are not supported for pairs",
+        ),
+        (
+            dummy_classification_data,
+            {"features": [1], "categorical_features": [1], "kind": "individual"},
+            "It is not possible to display individual effects",
+        ),
         (
             dummy_classification_data,
             {"features": [1], "kind": "foo"},
@@ -647,6 +665,101 @@ def test_plot_partial_dependence_does_not_override_ylabel(
     assert axes[1].get_ylabel() == "Partial dependence"
 
 
+@pytest.mark.parametrize(
+    "categorical_features, array_type",
+    [
+        (["col_A", "col_C"], "dataframe"),
+        ([0, 2], "array"),
+        ([True, False, True], "array"),
+    ],
+)
+def test_plot_partial_dependence_with_categorical(
+    pyplot, categorical_features, array_type
+):
+    X = [["A", 1, "A"], ["B", 0, "C"], ["C", 2, "B"]]
+    column_name = ["col_A", "col_B", "col_C"]
+    X = _convert_container(X, array_type, columns_name=column_name)
+    y = np.array([1.2, 0.5, 0.45]).T
+
+    preprocessor = make_column_transformer((OneHotEncoder(), categorical_features))
+    model = make_pipeline(preprocessor, LinearRegression())
+    model.fit(X, y)
+
+    # single feature
+    disp = PartialDependenceDisplay.from_estimator(
+        model,
+        X,
+        features=["col_C"],
+        feature_names=column_name,
+        categorical_features=categorical_features,
+    )
+
+    assert disp.figure_ is pyplot.gcf()
+    assert disp.bars_.shape == (1, 1)
+    assert disp.bars_[0][0] is not None
+    assert disp.lines_.shape == (1, 1)
+    assert disp.lines_[0][0] is None
+    assert disp.contours_.shape == (1, 1)
+    assert disp.contours_[0][0] is None
+    assert disp.deciles_vlines_.shape == (1, 1)
+    assert disp.deciles_vlines_[0][0] is None
+    assert disp.deciles_hlines_.shape == (1, 1)
+    assert disp.deciles_hlines_[0][0] is None
+    assert disp.axes_[0, 0].get_legend() is None
+
+    # interaction between two features
+    disp = PartialDependenceDisplay.from_estimator(
+        model,
+        X,
+        features=[("col_A", "col_C")],
+        feature_names=column_name,
+        categorical_features=categorical_features,
+    )
+
+    assert disp.figure_ is pyplot.gcf()
+    assert disp.bars_.shape == (1, 1)
+    assert disp.bars_[0][0] is None
+    assert disp.lines_.shape == (1, 1)
+    assert disp.lines_[0][0] is None
+    assert disp.contours_.shape == (1, 1)
+    assert disp.contours_[0][0] is None
+    assert disp.deciles_vlines_.shape == (1, 1)
+    assert disp.deciles_vlines_[0][0] is None
+    assert disp.deciles_hlines_.shape == (1, 1)
+    assert disp.deciles_hlines_[0][0] is None
+    assert disp.axes_[0, 0].get_legend() is None
+
+
+def test_plot_partial_dependence_legend(pyplot):
+    pd = pytest.importorskip("pandas")
+    X = pd.DataFrame(
+        {
+            "col_A": ["A", "B", "C"],
+            "col_B": [1, 0, 2],
+            "col_C": ["C", "B", "A"],
+        }
+    )
+    y = np.array([1.2, 0.5, 0.45]).T
+
+    categorical_features = ["col_A", "col_C"]
+    preprocessor = make_column_transformer((OneHotEncoder(), categorical_features))
+    model = make_pipeline(preprocessor, LinearRegression())
+    model.fit(X, y)
+
+    disp = PartialDependenceDisplay.from_estimator(
+        model,
+        X,
+        features=["col_B", "col_C"],
+        categorical_features=categorical_features,
+        kind=["both", "average"],
+    )
+
+    legend_text = disp.axes_[0, 0].get_legend().get_texts()
+    assert len(legend_text) == 1
+    assert legend_text[0].get_text() == "average"
+    assert disp.axes_[0, 1].get_legend() is None
+
+
 @pytest.mark.parametrize(
     "kind, expected_shape",
     [("average", (1, 2)), ("individual", (1, 2, 20)), ("both", (1, 2, 21))],
@@ -717,6 +830,41 @@ def test_partial_dependence_overwrite_labels(
             assert legend_text[0].get_text() == label
 
 
+@pytest.mark.parametrize(
+    "categorical_features, array_type",
+    [
+        (["col_A", "col_C"], "dataframe"),
+        ([0, 2], "array"),
+        ([True, False, True], "array"),
+    ],
+)
+def test_grid_resolution_with_categorical(pyplot, categorical_features, array_type):
+    """Check that we raise a ValueError when the grid_resolution is too small
+    respect to the number of categories in the categorical features targeted.
+    """
+    X = [["A", 1, "A"], ["B", 0, "C"], ["C", 2, "B"]]
+    column_name = ["col_A", "col_B", "col_C"]
+    X = _convert_container(X, array_type, columns_name=column_name)
+    y = np.array([1.2, 0.5, 0.45]).T
+
+    preprocessor = make_column_transformer((OneHotEncoder(), categorical_features))
+    model = make_pipeline(preprocessor, LinearRegression())
+    model.fit(X, y)
+
+    err_msg = (
+        "resolution of the computed grid is less than the minimum number of categories"
+    )
+    with pytest.raises(ValueError, match=err_msg):
+        PartialDependenceDisplay.from_estimator(
+            model,
+            X,
+            features=["col_C"],
+            feature_names=column_name,
+            categorical_features=categorical_features,
+            grid_resolution=2,
+        )
+
+
 # TODO(1.3): remove
 def test_partial_dependence_display_deprecation(pyplot, clf_diabetes, diabetes):
     """Check that we raise the proper warning in the display."""
@@ -761,7 +909,7 @@ def test_partial_dependence_plot_limits_one_way(
         feature_names=diabetes.feature_names,
     )
 
-    range_pd = np.array([-1, 1])
+    range_pd = np.array([-1, 1], dtype=np.float64)
     for pd in disp.pd_results:
         if "average" in pd:
             pd["average"][...] = range_pd[1]
@@ -773,6 +921,9 @@ def test_partial_dependence_plot_limits_one_way(
     disp.plot(centered=centered)
     # check that we anchor to zero x-axis when centering
     y_lim = range_pd - range_pd[0] if centered else range_pd
+    padding = 0.05 * (y_lim[1] - y_lim[0])
+    y_lim[0] -= padding
+    y_lim[1] += padding
     for ax in disp.axes_.ravel():
         assert_allclose(ax.get_ylim(), y_lim)
 
@@ -791,16 +942,20 @@ def test_partial_dependence_plot_limits_two_way(
         feature_names=diabetes.feature_names,
     )
 
-    range_pd = np.array([-1, 1])
+    range_pd = np.array([-1, 1], dtype=np.float64)
     for pd in disp.pd_results:
         pd["average"][...] = range_pd[1]
         pd["average"][0, 0] = range_pd[0]
 
     disp.plot(centered=centered)
-    coutour = disp.contours_[0, 0]
+    contours = disp.contours_[0, 0]
     levels = range_pd - range_pd[0] if centered else range_pd
+
+    padding = 0.05 * (levels[1] - levels[0])
+    levels[0] -= padding
+    levels[1] += padding
     expect_levels = np.linspace(*levels, num=8)
-    assert_allclose(coutour.levels, expect_levels)
+    assert_allclose(contours.levels, expect_levels)
 
 
 def test_partial_dependence_kind_list(
diff --git a/sklearn/inspection/tests/test_partial_dependence.py b/sklearn/inspection/tests/test_partial_dependence.py
index 5fe5ddb5d6db1..19a7f878c369f 100644
--- a/sklearn/inspection/tests/test_partial_dependence.py
+++ b/sklearn/inspection/tests/test_partial_dependence.py
@@ -136,8 +136,9 @@ def test_grid_from_X():
     # the unique values instead of the percentiles)
     percentiles = (0.05, 0.95)
     grid_resolution = 100
+    is_categorical = [False, False]
     X = np.asarray([[1, 2], [3, 4]])
-    grid, axes = _grid_from_X(X, percentiles, grid_resolution)
+    grid, axes = _grid_from_X(X, percentiles, is_categorical, grid_resolution)
     assert_array_equal(grid, [[1, 2], [1, 4], [3, 2], [3, 4]])
     assert_array_equal(axes, X.T)
 
@@ -148,7 +149,9 @@ def test_grid_from_X():
 
     # n_unique_values > grid_resolution
     X = rng.normal(size=(20, 2))
-    grid, axes = _grid_from_X(X, percentiles, grid_resolution=grid_resolution)
+    grid, axes = _grid_from_X(
+        X, percentiles, is_categorical, grid_resolution=grid_resolution
+    )
     assert grid.shape == (grid_resolution * grid_resolution, X.shape[1])
     assert np.asarray(axes).shape == (2, grid_resolution)
 
@@ -156,13 +159,66 @@ def test_grid_from_X():
     n_unique_values = 12
     X[n_unique_values - 1 :, 0] = 12345
     rng.shuffle(X)  # just to make sure the order is irrelevant
-    grid, axes = _grid_from_X(X, percentiles, grid_resolution=grid_resolution)
+    grid, axes = _grid_from_X(
+        X, percentiles, is_categorical, grid_resolution=grid_resolution
+    )
     assert grid.shape == (n_unique_values * grid_resolution, X.shape[1])
     # axes is a list of arrays of different shapes
     assert axes[0].shape == (n_unique_values,)
     assert axes[1].shape == (grid_resolution,)
 
 
+@pytest.mark.parametrize(
+    "grid_resolution",
+    [
+        2,  # since n_categories > 2, we should not use quantiles resampling
+        100,
+    ],
+)
+def test_grid_from_X_with_categorical(grid_resolution):
+    """Check that `_grid_from_X` always sample from categories and does not
+    depend from the percentiles.
+    """
+    pd = pytest.importorskip("pandas")
+    percentiles = (0.05, 0.95)
+    is_categorical = [True]
+    X = pd.DataFrame({"cat_feature": ["A", "B", "C", "A", "B", "D", "E"]})
+    grid, axes = _grid_from_X(
+        X, percentiles, is_categorical, grid_resolution=grid_resolution
+    )
+    assert grid.shape == (5, X.shape[1])
+    assert axes[0].shape == (5,)
+
+
+@pytest.mark.parametrize("grid_resolution", [3, 100])
+def test_grid_from_X_heterogeneous_type(grid_resolution):
+    """Check that `_grid_from_X` always sample from categories and does not
+    depend from the percentiles.
+    """
+    pd = pytest.importorskip("pandas")
+    percentiles = (0.05, 0.95)
+    is_categorical = [True, False]
+    X = pd.DataFrame(
+        {
+            "cat": ["A", "B", "C", "A", "B", "D", "E", "A", "B", "D"],
+            "num": [1, 1, 1, 2, 5, 6, 6, 6, 6, 8],
+        }
+    )
+    nunique = X.nunique()
+
+    grid, axes = _grid_from_X(
+        X, percentiles, is_categorical, grid_resolution=grid_resolution
+    )
+    if grid_resolution == 3:
+        assert grid.shape == (15, 2)
+        assert axes[0].shape[0] == nunique["num"]
+        assert axes[1].shape[0] == grid_resolution
+    else:
+        assert grid.shape == (25, 2)
+        assert axes[0].shape[0] == nunique["cat"]
+        assert axes[1].shape[0] == nunique["cat"]
+
+
 @pytest.mark.parametrize(
     "grid_resolution, percentiles, err_msg",
     [
@@ -177,8 +233,9 @@ def test_grid_from_X():
 )
 def test_grid_from_X_error(grid_resolution, percentiles, err_msg):
     X = np.asarray([[1, 2], [3, 4]])
+    is_categorical = [False]
     with pytest.raises(ValueError, match=err_msg):
-        _grid_from_X(X, grid_resolution=grid_resolution, percentiles=percentiles)
+        _grid_from_X(X, percentiles, is_categorical, grid_resolution)
 
 
 @pytest.mark.parametrize("target_feature", range(5))
diff --git a/sklearn/inspection/tests/test_pd_utils.py b/sklearn/inspection/tests/test_pd_utils.py
new file mode 100644
index 0000000000000..5f461ad498f5b
--- /dev/null
+++ b/sklearn/inspection/tests/test_pd_utils.py
@@ -0,0 +1,48 @@
+import numpy as np
+import pytest
+
+from sklearn.utils._testing import _convert_container
+
+from sklearn.inspection._pd_utils import _check_feature_names, _get_feature_index
+
+
+@pytest.mark.parametrize(
+    "feature_names, array_type, expected_feature_names",
+    [
+        (None, "array", ["x0", "x1", "x2"]),
+        (None, "dataframe", ["a", "b", "c"]),
+        (np.array(["a", "b", "c"]), "array", ["a", "b", "c"]),
+    ],
+)
+def test_check_feature_names(feature_names, array_type, expected_feature_names):
+    X = np.random.randn(10, 3)
+    column_names = ["a", "b", "c"]
+    X = _convert_container(X, constructor_name=array_type, columns_name=column_names)
+    feature_names_validated = _check_feature_names(X, feature_names)
+    assert feature_names_validated == expected_feature_names
+
+
+def test_check_feature_names_error():
+    X = np.random.randn(10, 3)
+    feature_names = ["a", "b", "c", "a"]
+    msg = "feature_names should not contain duplicates."
+    with pytest.raises(ValueError, match=msg):
+        _check_feature_names(X, feature_names)
+
+
+@pytest.mark.parametrize("fx, idx", [(0, 0), (1, 1), ("a", 0), ("b", 1), ("c", 2)])
+def test_get_feature_index(fx, idx):
+    feature_names = ["a", "b", "c"]
+    assert _get_feature_index(fx, feature_names) == idx
+
+
+@pytest.mark.parametrize(
+    "fx, feature_names, err_msg",
+    [
+        ("a", None, "Cannot plot partial dependence for feature 'a'"),
+        ("d", ["a", "b", "c"], "Feature 'd' not in feature_names"),
+    ],
+)
+def test_get_feature_names_error(fx, feature_names, err_msg):
+    with pytest.raises(ValueError, match=err_msg):
+        _get_feature_index(fx, feature_names)
diff --git a/sklearn/utils/__init__.py b/sklearn/utils/__init__.py
index 97db16e73868f..923c08d44c6f4 100644
--- a/sklearn/utils/__init__.py
+++ b/sklearn/utils/__init__.py
@@ -383,7 +383,14 @@ def _safe_assign(X, values, *, row_indexer=None, column_indexer=None):
     )
 
     if hasattr(X, "iloc"):  # pandas dataframe
-        X.iloc[row_indexer, column_indexer] = values
+        with warnings.catch_warnings():
+            # pandas >= 1.5 raises a warning when using iloc to set values in a column
+            # that does not have the same type as the column being set. It happens
+            # for instance when setting a categorical column with a string.
+            # In the future the behavior won't change and the warning should disappear.
+            # TODO(1.3): check if the warning is still raised or remove the filter.
+            warnings.simplefilter("ignore", FutureWarning)
+            X.iloc[row_indexer, column_indexer] = values
     else:  # numpy array or sparse matrix
         X[row_indexer, column_indexer] = values