[🐛 | Bug]: Performance issue with Powerpoint #7111

vxgmichel · 2024-04-19T13:17:25Z

Parsec version tested on:

2.16.3

Platforms tested on:

Windows

Bug description:

A user reports critical performance issue when opening a local ~25 MB file with Powerpoint. This comes with Parsec taking ~45% CPU usage while the file is being opened.

First step is trying to reproduce the issue.
Second step is logging the winfsp operations performed by Powerpoint.

vxgmichel · 2024-05-17T16:23:35Z

I can confirm that this issue no longer occurs with parsec v3.

vxgmichel · 2024-05-21T15:03:38Z

After further testing, the read operations take most of the time. In particular, 75% of the time is spent in:

parsec-cloud/parsec/core/fs/storage/workspace_storage.py

Lines 285 to 286 in d397c03

    
           async def get_chunk(self, chunk_id: ChunkID) -> bytes: 
        
               return await self.rs_instance.get_chunk(chunk_id)

vxgmichel · 2024-05-22T10:14:02Z

Conclusion of the investigation:

PowerPoint typically reads the file by chunks of 3KB on average. However, parsec stores ciphered data chunks of 256KB in its local database. Since parsec v2 doesn't implement any caching of those chunks in RAM, that means that for each 256 KB read, there is going to be about 80 database read operation and local symkey decryption of the same 256KB chunks. It turns out the decryption is more expensive than the database access (by a factor of 10, probably due to the database keeping some data in RAM). Since the decryption of 256KB is around 1ms, reading 256 KB in chunks of 3KB cost 80ms more than what is necessary. For comparaison, a typical 256KB read operation in parsec v2 is between 3 and 5 ms. That explains the 20x slowdown in the measurements. This slowdown is what's causing PowerPoint to lag and freeze as it is not designed to work under those slow conditions.

In parsec v3, the problem does not appear since a chunk cache has been implemented:

parsec-cloud/libparsec/crates/client/src/workspace/store.rs

Lines 415 to 423 in fec5c36

    
           /// Very simple cache for chunks: we consider a chunk is most of the time much bigger 
        
           /// than a typical kernel read (512ko vs 4ko), so it's a big win to just keep the 
        
           /// chunks currently being read in memory. 
        
           /// To approximate that, we just keep the last 16 chunks read in memory. 
        
           #[derive(Debug)] 
        
           struct ChunksCache { 
        
               items: [Option<(ChunkID, Bytes)>; 16], 
        
               round_robin: usize, 
        
           }

For later investigation, this class of problems can easily be tested using the following cat program:

import sys
import time
if __name__ == "__main__":
    path = sys.argv[1]
    start = time.time()
    with open(path, "rb", buffering=0) as f:
        while True:
            data = f.read(3*1024)
            if not data:
                break
            sys.stdout.buffer.write(data)
    print(time.time() - start, file=sys.stderr)

vxgmichel · 2024-05-22T10:14:28Z

Closing now as it's fixed in v3 (and easily fixable in v2 if necessary)

vxgmichel added the bug Issue related to a bug label Apr 19, 2024

mmmarcos added the v2 Related to the 2.x branch label Apr 22, 2024

FirelightFlagboy added the A-Client Area: Client application label Apr 22, 2024

vxgmichel closed this as completed May 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[🐛 | Bug]: Performance issue with Powerpoint #7111

[🐛 | Bug]: Performance issue with Powerpoint #7111

vxgmichel commented Apr 19, 2024 •

edited

vxgmichel commented May 17, 2024

vxgmichel commented May 21, 2024

vxgmichel commented May 22, 2024 •

edited

vxgmichel commented May 22, 2024

[🐛 | Bug]: Performance issue with Powerpoint #7111

[🐛 | Bug]: Performance issue with Powerpoint #7111

Comments

vxgmichel commented Apr 19, 2024 • edited

Parsec version tested on:

Platforms tested on:

Bug description:

vxgmichel commented May 17, 2024

vxgmichel commented May 21, 2024

vxgmichel commented May 22, 2024 • edited

vxgmichel commented May 22, 2024

vxgmichel commented Apr 19, 2024 •

edited

vxgmichel commented May 22, 2024 •

edited