Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH]: Automatically trim the time on the x axis #28158

Open
RomainPastureau opened this issue May 1, 2024 · 20 comments
Open

[ENH]: Automatically trim the time on the x axis #28158

RomainPastureau opened this issue May 1, 2024 · 20 comments

Comments

@RomainPastureau
Copy link

RomainPastureau commented May 1, 2024

Problem

I would like to have the x-axis showing the timestamps of the samples of audio files. Here is a minimal example:

import numpy as np
from matplotlib import pyplot as plt
from matplotlib import dates as mdates
from scipy.io import wavfile

# Open the WAV files
audio_1 = wavfile.read("audio1.wav")
freq_audio_1 = audio_1[0]
samples_1 = audio_1[1][:, 0]  # Turn to mono

audio_2 = wavfile.read("audio2.wav")
freq_audio_2 = audio_2[0]
samples_2 = audio_2[1][:, 0]  # Turn to mono

# Create the timestamps
t_audio_1 = np.arange(0, len(samples_1)) / freq_audio_1
t_audio_2 = np.arange(0, len(samples_2)) / freq_audio_2

# We turn them into datetime
t_audio_1 = np.array(t_audio_1*1000, dtype="datetime64[ms]")
t_audio_2 = np.array(t_audio_2*1000, dtype="datetime64[ms]")

# Create the figure
fig, ax = plt.subplots(1, 2, constrained_layout=True)

# If the audio files are more than 1 hour, we format as HH:MM:SS, else just MM:SS
if len(samples_1) / freq_audio_1 >= 3600 and len(samples_2) / freq_audio_2 >= 3600 :
   formatter = mdates.AutoDateFormatter(mdates.AutoDateLocator(), defaultfmt='%H:%M:%S')
else:
   formatter = mdates.AutoDateFormatter(mdates.AutoDateLocator(), defaultfmt='%M:%S')

plt.gcf().axes[0].xaxis.set_major_formatter(formatter)
plt.gcf().axes[1].xaxis.set_major_formatter(formatter)

ax[0].plot(t_audio_1, samples_1)
ax[1].plot(t_audio_2, samples_2)
plt.show()

Here is the output:
Figure_11

As you can see, the microsecond precision makes so that the ticks on the x axis are shown on top of each other.

Proposed solution

The ideal would be to have the plot automatically decide how many significant digits after the comma are necessary, depending on the level of zoom (in a similar fashion to the way Audacity displays timestamps).

Thank you :)

@story645
Copy link
Member

story645 commented May 1, 2024

As a first pass, this could probably be implemented using funcformatter or zoom events & might make a good example of dynamic label updating on zoom?

https://matplotlib.org/3.8.4/gallery/ticks/custom_ticker1.html#sphx-glr-gallery-ticks-custom-ticker1-py
+
https://matplotlib.org/stable/gallery/event_handling/zoom_window.html

Basically wondering on trade offs if a good example of how to do this would be more useful than a library function that may need a bunch of parameters to get folks what they want.

@WeatherGod
Copy link
Member

WeatherGod commented May 1, 2024 via email

@timhoffm
Copy link
Member

timhoffm commented May 1, 2024

@WeatherGod That’s basically what ConciseDateFormatter is doing https://matplotlib.org/stable/gallery/ticks/date_concise_formatter.html.

@RomainPastureau
Copy link
Author

The problem with the ConsiseDateFormatter, is that it will indicate a date on top of the time. For timestamps, I don't really want Jan 1, 1970 to be there.

To elaborate on the issue, there are actually two issues in one: make sure that the ticks labels don't appear on top of each other, and reduce the ticks labels to their most significant digit.

@story645
Copy link
Member

story645 commented May 2, 2024

To elaborate on the issue, there are actually two issues in one: make sure that the ticks labels don't appear on top of each other, and reduce the ticks labels to their most significant digit.

  1. Is rotating the tick labels an option? Otherwise it'd be a locator that chooses ticks based on label width, which means it would have to be formatter/label dependent, which I'm not sure is technically possible
  2. something like a set_powerlimits for dates?

@timhoffm
Copy link
Member

timhoffm commented May 2, 2024

Otherwise it'd be a locator that chooses ticks based on label width, which means it would have to be formatter/label dependent, which I'm not sure is technically possible

AFAIK this is not possible. And it would be tricky: Formatters need to know all label positions and from that determine the number of significant digits. If Locators on the other hand, want to decide placement on the label size, this would result in a mutual dependence loop.

@story645
Copy link
Member

story645 commented May 3, 2024

Is this time formatter from librosa kinda what you're after: https://librosa.org/doc/main/generated/librosa.display.TimeFormatter.html

@rcomer
Copy link
Member

rcomer commented May 3, 2024

I am confused by the example. Standard python datetimes have microseconds so I tried this:

import datetime
import matplotlib.pyplot as plt

x = datetime.datetime(2021, 5, 4)
dates = [x.replace(minute=n, microsecond=n) for n in range(50)]

ax = plt.figure().add_subplot()
ax.plot(dates, range(50))

image

If I use the date formatter as in the OP (less than 1 hour case) I only get years:

import matplotlib.dates as mdates

formatter = mdates.AutoDateFormatter(mdates.AutoDateLocator(), defaultfmt='%M:%S')
ax.xaxis.set_major_formatter(formatter)

image

If instead I define the formatter using the locator instance on the axis, it goes back to what you get by default:

formatter = mdates.AutoDateFormatter(ax.xaxis.get_major_locator(), defaultfmt='%M:%S')
ax.xaxis.set_major_formatter(formatter)

image

What am I missing? I admit I have not really followed what the defaultfmt keyword does.

I am using mpl 3.8.2.

@RomainPastureau
Copy link
Author

@story645 Rotating the ticks is not always an option, unfortunately. I would need this for two different projects, and in one of these I am plotting 8 subplots (2 horizontally, 4 vertically) in the same graph, so I need the x axis to be as compact as possible. I will look into your librosa function, though, thanks for that!

@rcomer To be fair I am also confused between formatters and locators, and the code I have provided is mostly the result of tinkering around with diverse solutions I found online. In any case, I did obtained the different results you are showing in your figures at various points, but none of them give interesting results... So I am a bit lost.

In any case, thanks to all of you for the interest on this question. After looking around and asking the question on other forums, I seem to understand that it is a feature that other people would be interested in!

@jklymak
Copy link
Member

jklymak commented May 3, 2024

Would you consider making a self contained example that makes the problem clear? We can't reproduce your issue if we can't run your code.

@story645
Copy link
Member

story645 commented May 3, 2024

To be fair I am also confused between formatters and locator

Can you elaborate on this a bit so that we can try and make the documentation clearer?

Broadly:

  • locators control tick position
  • formatters control tick label

These generally function independently, but sequentially - so locator generates positions then formatter labels those positions. That's why Tim and I don't think a locator that adjusts tick positions based on the formatter generated label would be feasible, though I'm now curious about the Librosa implementation.

@story645
Copy link
Member

story645 commented May 3, 2024

Also @ksunden dynamic relabeling/scale adjustment based on subsample resolution of unitized data (which thinking more, Librosa knows it has dates) might be a good example for the data-prototype (if you don't already have one 😅).

@rcomer
Copy link
Member

rcomer commented May 3, 2024

I think I have understood now. It is not about the precision of the data, but if the interval between the ticks is less than a second:

import datetime
import matplotlib.pyplot as plt

x = datetime.datetime(2021, 5, 4)
dates = [x.replace(second=n) for n in range(3)]

ax = plt.figure().add_subplot()
ax.plot(dates, range(3))

image

The behaviour is defined by the rcParam

#date.autoformatter.microsecond: %M:%S.%f

@rcomer
Copy link
Member

rcomer commented May 3, 2024

And actually the AutoDateFormatter docstring tells us we can modify that with a function:

import datetime

import matplotlib.pyplot as plt
import matplotlib.dates as mdates

x = datetime.datetime(2021, 5, 4)
dates = [x.replace(second=n) for n in range(3)]

ax = plt.figure().add_subplot()
ax.plot(dates, range(3))

def my_format_function(x, pos=None):
    x = mdates.num2date(x)
    fmt = '%M:%S.%f'
    label = x.strftime(fmt)
    label = label.rstrip("0")
    label = label.rstrip(".")
    return label

formatter = mdates.AutoDateFormatter(ax.xaxis.get_major_locator())
formatter.scaled[1 / mdates.MUSECONDS_PER_DAY] = my_format_function
ax.xaxis.set_major_formatter(formatter)

image

@RomainPastureau
Copy link
Author

Hi!

Thank you for all of your responses. I realize now that my question wasn't clear at the beginning, so I will provide a full and simplified description of my problem, as suggested by @jklymak

Clear description of the issues

  • When plotting a time series, I would like to get the timestamps on the x-axis.
  • The timestamps should always show at least the minutes and seconds (MM:SS).
  • If the time series is longer than an hour, the timestamps should have the format HH:MM:SS.
  • Decimals of seconds should appear if their digits are significant at the current zoom level on the figure. For example, if the ticks on the x-axis are 5 ms long, three decimals should appear, not more.
  • In a similar fashion to what appears for "regular" floats on the x-axis, if I zoom on the generated figure, I would like to see the tick labels only up to the last significant digit (by default, 6 decimals always appear for time series).
  • The tick labels should not appear on top of each other, for readability.

First example: no formatting, numpy datetime64[us]

This is the minimal reproducible example for my use case. I am importing a WAV file, creating timestamps from its frequency, and plotting it, passing the timestamps on the x-axis.

Code

import os.path as op
import scipy.io as sio
import numpy as np
import matplotlib.pyplot as plt

# Get the WAV example file from Scipy
data_dir = op.join(op.dirname(sio.__file__), 'tests', 'data')
wav_example_file = op.join(data_dir, 'test-44100Hz-2ch-32bit-float-be.wav')

# Load the audio file
audio = sio.wavfile.read(wav_example_file)
freq_audio = audio[0]
samples = audio[1][:, 0]  # Take only the left channel

# Create the timestamps
t_audio = np.arange(0, len(samples)) / freq_audio

# Turn them into datetime
t_audio = np.array(t_audio*1000000, dtype="datetime64[us]")

# Create the figure
plt.plot(t_audio, samples)
plt.show()

Output

Figure_3

Remarks

As you can see, when plotted, the timestamps are on top of each other - plus, here, we do not really care about the precision after the 3rd decimal place as the ticks are spaced by 2 ms each. Ideally, the significant digits would increase when I zoom in dynamically using the mouse; however, the precision remains at 6 digits after the decimal point, no matter what.

Note that the output is exactly the same if instead of:

t_audio = np.array(t_audio*1000000, dtype="datetime64[us]")

I use Python datetime objects via:

t_audio = [datetime.datetime(1970, 1, 1, int(t // 3600) % 24, int((t // 60) % 60), int((t % 60) // 1), int((t % 1) * 1000000)) for t in t_audio]

Using Python datetime object makes the computation time way larger for longer audio files, though.

Second example: no formatting, numpy timedelta64[us]

This time I am using a numpy timedelta64 object for the x-axis:

Code

import os.path as op
import scipy.io as sio
import numpy as np
import matplotlib.pyplot as plt

# Get the WAV example file from scipy
data_dir = op.join(op.dirname(sio.__file__), 'tests', 'data')
wav_example_file = op.join(data_dir, 'test-44100Hz-2ch-32bit-float-be.wav')

# Load the audio file
audio = sio.wavfile.read(wav_example_file)
freq_audio = audio[0]
samples = audio[1][:, 0]  # Take only the left channel

# Create the timestamps
t_audio = np.arange(0, len(samples)) / freq_audio

# Turn them into datetime
t_audio = np.array(t_audio*1000000, dtype="timedelta64[us]")

# Create the figure
plt.plot(t_audio, samples)
plt.show()

Output

Figure_4

Remarks

This time, the time format is ignored and the x-axis only shows the microseconds. Once again, the output is the same if I use Python timedelta objects:

Third example: using AutoDateFormatter

In order to get the format I want, I am now trying to use Matplotlib formatters. Here is the result using AutoDateFormatter:

Code

import os.path as op
import scipy.io as sio
import numpy as np
import matplotlib.pyplot as plt
from matplotlib import dates as mdates

# Get the WAV example file from scipy
data_dir = op.join(op.dirname(sio.__file__), 'tests', 'data')
wav_example_file = op.join(data_dir, 'test-44100Hz-2ch-32bit-float-be.wav')

# Load the audio file
audio = sio.wavfile.read(wav_example_file)
freq_audio = audio[0]
samples = audio[1][:, 0]  # Take only the left channel

# Create the timestamps
t_audio = np.arange(0, len(samples)) / freq_audio

# Turn them into datetime
t_audio = np.array(t_audio*1000000, dtype="datetime64[us]")

# Create the figure
fig = plt.figure()
plt.plot(t_audio, samples)

# Use a formatter
formatter = mdates.AutoDateFormatter(mdates.AutoDateLocator())
fig.axes[0].xaxis.set_major_formatter(formatter)

# Plot the figure
plt.show()

Output

Figure_5

Remarks

Obviously, here, that's not the output we want, so let's try something else. Adding a default format (formatter = mdates.AutoDateFormatter(mdates.AutoDateLocator(), defaultfmt="%H:%M:%S")) doesn't change anything.

Fourth example: using a function (@rcomer solution)

import os.path as op
import scipy.io as sio
import numpy as np
import matplotlib.pyplot as plt
from matplotlib import dates as mdates

def my_format_function(x, pos=None):
    x = mdates.num2date(x)
    fmt = '%M:%S.%f'
    label = x.strftime(fmt)
    label = label.rstrip("0")
    label = label.rstrip(".")
    return label

# Get the WAV example file from scipy
data_dir = op.join(op.dirname(sio.__file__), 'tests', 'data')
wav_example_file = op.join(data_dir, 'test-44100Hz-2ch-32bit-float-be.wav')

# Load the audio file
audio = sio.wavfile.read(wav_example_file)
freq_audio = audio[0]
samples = audio[1][:, 0]  # Take only the left channel

# Create the timestamps
t_audio = np.arange(0, len(samples)) / freq_audio

# Turn them into datetime
t_audio = np.array(t_audio*1000000, dtype="datetime64[us]")

# Create the figure
ax = plt.figure().add_subplot()
plt.plot(t_audio, samples)

# Use a formatter
formatter = mdates.AutoDateFormatter(ax.xaxis.get_major_locator())
formatter.scaled[1 / mdates.MUSECONDS_PER_DAY] = my_format_function
ax.xaxis.set_major_formatter(formatter)

# Plot the figure
plt.show()

Output

Figure_6

Remarks

Now it works! I can adapt it to have the hours depending on the length of the timestamps. The only caveat, it seems, is that time before 00:00 is 59:59 (while ideally, i would prefer a -00:01) - which makes sense as we are working with datetime and not timedelta.

Conclusion

So, I guess the problem for my specific use case is solved (thanks again @rcomer!) - that being said, I do think that other users may have an interest in this specifically. Matplolib documentation focuses a lot on personalized tick formats for dates, taking into account month lengths, business days, etc., which is incredibly useful when working with dates. But, when working with timestamps, the solution wasn't straightforward (or maybe I missed something). If I am not the only one having that issue, maybe it would be an interesting feature to implement? It could be a new type of formatter that would take a default time format and respond dynamically to it.

Thank you @story645 for your description of Formatters and Locators, I understand it better now. Speaking of, when selecting %H:%M:%S.%f formatting instead of %M:%S.%f, sometimes the labels are shown on top of their neighbors. I believe a function that detects the maximum length of a label given the format and calculates the amount of ticks accordingly may solve my issue here.

@timhoffm
Copy link
Member

timhoffm commented May 6, 2024

Glad to see @rcomer’s suggestion solves your problem. Would an example be helpful enough for other users?

I believe a function that detects the maximum length of a label given the format and calculates the amount of ticks accordingly may solve my issue here.

As stated above this is not trivial, unfortunately. The locators first decide on the positions (and number) of number of ticks. Then, the formatter decide how to represent them. - One can only reasonably decide on the formatting if you know all ticks to be plotted.
You want additionally the reverse: decide on the positions given the format. This mutual interaction is difficult to realize with the current architecture of separate locators and formatters. At best, you could have a loop that checks the overlap (overlap checking in itself is somewhat involved because it depends on drawing characteristics like figure size and font size) and forces the locator to use less positions if an overlap is detected. In general, there will not even be a solution - you can always increase font size or reduce figure size enough to force an overlap of just two tick labels.

@jklymak
Copy link
Member

jklymak commented May 7, 2024

@RomainPastureau we do not have access to your wavs, and we do not know what freq_audio is. Can you make these more reproducible?

@jklymak
Copy link
Member

jklymak commented May 7, 2024

Irreprodocibility aside, the following seems to do what you want:

fig, ax = plt.subplots()
ax.plot(t_audio, signal)
locator = mdates.AutoDateLocator()
formatter = mdates.ConciseDateFormatter(locator, show_offset=False)
ax.xaxis.set_major_locator(locator)
ax.xaxis.set_major_formatter(formatter)

@RomainPastureau
Copy link
Author

@jklymak I put exemples in this comment with reproducible examples using a wav from scipy! Sorry about my first confusing example - it was the end of the day when I wrote down this first message and I hadn't think this through 😅
I will try your example code, thank you - though I am afraid that the ConciseDateFormatter will also indicate a date, which I don't really want.

@RomainPastureau
Copy link
Author

Replying to say that I tweaked @rcomer solution - it seems to work, and I implemented it in a package I developed. This is the result on one of my example outputs:
Delay between an audio file and an excerpt from it

Here is a snippet of the code I used:

def get_label(value, include_hour=True, include_us=True):
    """Returns a label value depending on the selected parameters."""

    neg = False
    # If negative, put positive
    if value < 0:
        neg = True
        value = abs(value)

    # If zero, set zero
    elif value == 0:
        if include_hour:
            return "00:00:00"
        else:
            return "00:00"

    # Turn to timedelta
    td_value = mdates.num2timedelta(value)

    seconds = td_value.total_seconds()
    hh = str(int(seconds // 3600)).zfill(2)
    mm = str(int((seconds // 60) % 60)).zfill(2)
    ss = str(int(seconds % 60)).zfill(2)

    us = str(int((seconds % 1) * 1000000)).rstrip("0")

    label = ""
    if neg:
        label += "-"
    if include_hour:
        label += hh + ":"
    label += mm + ":" + ss
    if include_us and us != "":
        label += "." + us

    return label

def get_label_hh_mm_ss_no_ms(value, pos=None):
    """Returns a label value as HH:MM:SS, without any ms value."""
    return get_label(value, True, False)

def get_label_hh_mm_ss(value, pos=None):
    """Returns a label value as HH:MM:SS.ms, without any trailing zero."""
    return get_label(value, True, True)

def set_label_time_figure(ax):
    """Sets the time formatted labels on the x axes."""
    if x_format_figure == "time":
        formatter = mdates.AutoDateFormatter(ax.xaxis.get_major_locator())
        formatter.scaled[1 / mdates.MUSECONDS_PER_DAY] = get_label_hh_mm_ss
        formatter.scaled[1 / mdates.SEC_PER_DAY] = get_label_hh_mm_ss
        formatter.scaled[1 / mdates.MINUTES_PER_DAY] = get_label_hh_mm_ss_no_ms
        formatter.scaled[1 / mdates.HOURS_PER_DAY] = get_label_hh_mm_ss_no_ms
        formatter.scaled[1] = get_label_hh_mm_ss_no_ms
        formatter.scaled[mdates.DAYS_PER_MONTH] = get_label_hh_mm_ss_no_ms
        formatter.scaled[mdates.DAYS_PER_YEAR] = get_label_hh_mm_ss_no_ms
        ax.xaxis.set_major_formatter(formatter)
        return ax

    return ax

i = 0

It is probably very naive code - but it results in exactly what I needed. I just have to call ax = set_label_time_figure(ax) after each plot/subplot and it works - even if I zoom in or unzoom. I also used timedelta objects instead of datetime objects.

I still think a proper, built-in formatter may be beneficial for other people working on time series - but at least, now, I see that it is possible.

Thank you all for your help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants