Does zram impede disk cache?

Björn Tantau@swg-empire.de · edit-2 2 days ago

Does zram impede disk cache?

moonpiedumplings@programming.dev · 12 hours ago

Databases are special. They ofte implement their own optimizations, faster than more general system optimizations.

For examole: https://www.postgresql.org/docs/current/wal-intro.html

Because WAL restores database file contents after a crash, journaled file systems are not necessary for reliable storage of the data files or WAL files. In fact, journaling overhead can reduce performance, especially if journaling causes file system data to be flushed to disk. Fortunately, data flushing during journaling can often be disabled with a file system mount option, e.g., data=writeback on a Linux ext3 file system. Journaled file systems do improve boot speed after a crash.

I didn’t see much in the docs about swap, but I wouldn’t be suprised if postgres also had memory optimizations, like it included it’s own form of in memory compression.

Your best bet is probably to ask someone who is familiar with the internals of postgres.

aubeynarf@lemmynsfw.com · edit-2 2 days ago

Why would you reserve ram for swap???

You’re hindering the OS’s ability to manage memory.

Put swap on disk. Aim for it to rarely be touched - but it needs to be there so the OS can move idle memory data out if it wants to.

Don’t hard-allocate a memory partition for postgres. Let it allocate and free as it sees fit.

Then the OS will naturally use all possible RAM for cache, with the freedom to use more or less for the server process as demand requires.

Monitor queries to ensure you’re not seeing table scans due to missing indexes. Make sure VACUUM is happening either automatically or manually.

Björn Tantau@swg-empire.de · 2 days ago

Why would you reserve ram for swap???

It’s a useful way of squeezing out a few GB more. Worked wonders on my starved Steam Deck and allowed me to play Cities Skylines smoothly and without crashes.

But on a DB heavy server that is apparently not a good idea. I’ve switched to a swap file.

Monitor queries to ensure you’re not seeing table scans due to missing indexes.

There are definitely some unoptimised queries and missing indexes. Lemmy 1.0 will supposedly fix a lot of them.

non_burglar@lemmy.world · edit-2 2 days ago

If you put swap in zram, you are paging from RAM to RAM. May as well just not use swap and save the cycles.

BB_C@programming.dev · 2 days ago

The point is compression.

% swapon
NAME           TYPE      SIZE USED  PRIO
/dev/nvme0n1p2 partition   8G   0B     5
/dev/sda2      partition  32G   0B    -2
/dev/zram1     partition 3.5G 1.8G 32767
/dev/zram2     partition 3.5G 1.8G 32767
/dev/zram3     partition 3.5G 1.8G 32767
/dev/zram4     partition 3.5G 1.8G 32767
/dev/zram5     partition 3.5G 1.8G 32767
/dev/zram6     partition 3.5G 1.8G 32767
/dev/zram7     partition 3.5G 1.8G 32767
/dev/zram8     partition 3.5G 1.8G 32767

% zramctl
NAME       ALGORITHM DISKSIZE   DATA  COMPR  TOTAL STREAMS MOUNTPOINT
/dev/zram8 zstd          3.5G 293.4M 189.2M 192.5M         [SWAP]
/dev/zram7 zstd          3.5G 282.1M 187.5M   192M         [SWAP]
/dev/zram6 zstd          3.5G 284.6M 189.4M 192.9M         [SWAP]
/dev/zram5 zstd          3.5G 297.8M 197.3M 200.1M         [SWAP]
/dev/zram4 zstd          3.5G 304.9M 202.9M 206.7M         [SWAP]
/dev/zram3 zstd          3.5G 300.7M 201.9M 204.6M         [SWAP]
/dev/zram2 zstd          3.5G 311.3M 207.2M 210.6M         [SWAP]
/dev/zram1 zstd          3.5G 307.9M 210.5M 213.3M         [SWAP]
/dev/zram0 zstd          <not used for swap>

non_burglar@lemmy.world · 2 days ago

zswap is specifically built to this end and far better suited to it.

zram is great, but it is simply a ramdisk and inappropriate to ops task. It cannot dynamically grow/shrink or deal with hot/cold pages.

BB_C@programming.dev · 2 days ago

zswap is not better than modern zram in any way. And you can set up the latter with writeback anyway.

But that’s not OP’s problem since “swap gets hardly touched” in OP’s case.

non_burglar@lemmy.world · 2 days ago

Zram does not impede disk cache, it’s a block device with compression, unavailable to the kernel for anything else.

I do wonder what you’re trying to achieve by moving swap to zram? You’re potentially moving pages in and out of swap for no real reason, with compression, where the swap wouldn’t have occurred if zram weren’t in place.

taaz@biglemmowski.win · edit-2 2 days ago

Linux has kind of two forms of memory pages (entries in RAM), one is a file cache (page cache) and the other is “memory allocated by programs for work” (anonymous pages).

When you look at memory consumed by a process you are looking at RSS, page/file cache is part of kernel and for example in btop corresponds to Cached.

Page cache can never be moved into swap - that would be the same as duplicating the file from one place on a disk to another place on a (possibly different) disk.
If more memory is needed, page cache is evicted (written back into the respective file, if changed). Only anonymous pages (not backed by anything permanent) can be moved into swap.

So what does “PostgreSQL heavily relies on the OSs disk cache” mean? The more free memory there is, the more files can be kept cached in RAM and the faster postgres can then retrieve these files.

When you add zram, you dedicate part of actual RAM to a compressed swap device which, as I said above, will never contain page cache.
In theory this still increases the total available memory but in reality that is only true if you configure the kernel to aggressively “swap” anonymous pages into the zram backed swap.

Notes: I tried to simplify this a bit so it might not be exact, also if you look at a process, the memory consumed by it is called RSS and it contains multiple different things not just memory directly allocated by the code of the program.

CondorWonder@lemmy.ca · 2 days ago

Based on what I’ve seen with my use of ZRam I don’t think it reserves the total space, but instead consumes whatever is shown in the output of zramctl --output-all. If you’re swapping then yes it would take memory from the system (up to the 8G disk size), based on how compressible the swapped content is (like if you’re getting a 3x ratio it’s 8GB/3=2.6GB). That said - it will take memory from the disk cache if you’re swapping.

Realistically I think your issue is IO and there’s not much you can do with if your disk cache is being flushed. Switching to zswap might help as it should spill more into disk if you’re under memory pressure.

frongt@lemmy.zip · 2 days ago

Sounds easy enough to test experimentally.

Björn Tantau@swg-empire.de · 2 days ago

Strangely enough my free and buffer/cache numbers from free -m barely changed after removing the zram file. Maybe it’s more intelligent than I thought.

BB_C@programming.dev · 2 days ago

zram file

what zram file?

Björn Tantau@swg-empire.de · 2 days ago

Well, “file”. In this case it was /dev/zram0.

BB_C@programming.dev · 2 days ago

Okay. I thought for a moment that you and everyone else were not on the same page.

sga@piefed.social · 2 days ago

If possible, use zswap. it is more flexible, and effictively functions like swap. reduce swappiness a lot (something like 5 or 10). zram afaik is not flexible, swap is.

https://askubuntu.com/questions/471912/zram-vs-zswap-vs-zcache-ultimate-guide-when-to-use-which-one

this suggests you to have a physical swap too. I for one do not have it. so i have a bit of wierdity, but performs fine imo. maybe give it a try

BB_C@programming.dev · 2 days ago

Use zram devices equal to the number of threads in your system.
Use zstd compression.
Mount zram devices as swap with high priority.
Mount disk swap partition(s) with low priority.

Increase swapiness:

   sysctl vm.swappiness=<larger number than default>

Use zramctlto see detailed info about your zram disks.
Check with iotop to see if something unexpected is using a lot of IO traffic.

Shadow@lemmy.ca · 2 days ago

Yes, configuring memory to be used for zram would mark it as unavailable for kernel fs caching.

Does iostat show your disks being pegged when it’s slow? Odd that performance would be so bad on those specs, makes me think you have disk Io issues maybe.

Björn Tantau@swg-empire.de · 2 days ago

I definitely have disk io issues. I really wish I had an SSD or two.

chirping@infosec.pub · 1 day ago

If you are on HDD then looking at what else is using the same disk, and reducing that usage, may yield some results. Forexample, if /var/log is on the same disk and can’t be avoided, then reducing log volume or batching writes may reduce the “context switches” your HDD has to do. There should be options for I/O limits/throttling/priority in systemd. If you have only postgres on the HDD, I’d consider giving it 90% of the max bandwidth – maybe that’d be more effective than going full throttle and hitting the wall. If you have postgres and some other service fighting for the HDD’s time, these limits could help. Make sure access time tracking is off (or set to relatime).