if you could pick a standard format for a purpose what would it be and why?
e.g. flac for lossless audio because…
(yes you can add new categories)
summary:
- photos .jxl
- open domain image data .exr
- videos .av1
- lossless audio .flac
- lossy audio .opus
- subtitles srt/ass
- fonts .otf
- container mkv (doesnt contain .jxl)
- plain text utf-8 (many also say markup but disagree on the implementation)
- documents .odt
- archive files (this one is causing a bloodbath so i picked randomly) .tar.zst
- configuration files toml
- typesetting typst
- interchange format .ora
- models .gltf / .glb
- daw session files .dawproject
- otdr measurement results .xml
This is the kind of thing i think about all the time so i have a few.
.tar.zst
.zip
andgzip
/.gz
) and does so faster..tar
), compressing (.zst
), and (if you so choose) encrypting (.gpg
),.tar.zst
follows the Unix philosophy of “Make each program do one thing well.”..tar.xz
is also very good and seems more popular (probably since it was released 6 years earlier in 2009), but, when tuned to it’s maximum compression level,.tar.zst
can achieve a compression ratio pretty close to LZMA (used by.tar.xz
and.7z
) and do it faster[1].JPEG XL
/.jxl
.jpeg
,.png
,.gif
).AV1
.mp4
) and VP9[3].OpenDocument / ODF / .odt
.odt
is simply a better standard than.docx
.https://archlinux.org/news/now-using-zstandard-instead-of-xz-for-package-compression/ ↩︎
https://tonisagrista.com/blog/2023/jpegxl-vs-avif/ ↩︎
https://engineering.fb.com/2018/04/10/video-engineering/av1-beats-x264-and-libvpx-vp9-in-practical-use-case/ ↩︎
.tar
is pretty bad as it lacks in index, making it impossible to quickly seek around in the file. The compression on top adds another layer of complication. It might still work great as tape archiver, but for sending files around the Internet it is quite horrible. It’s really just getting dragged around for cargo cult reasons, not because it’s good at the job it is doing.In general I find the archive situation a little annoying, as archives are largely completely unnecessary, that’s what we have directories for. But directories don’t exist as far as HTML is concerned and only single files can be downloaded easily. So everything has to get packed and unpacked again, for absolutely no reason. It’s a job computers should handle transparently in the background, not an explicit user action.
Many file managers try to add support for
.zip
and allow you to go into them like it is a folder, but that abstraction is always quite leaky and never as smooth as it should be..tar.pixz/.tpxz has an index and uses LZMA and permits for parallel compression/decompression (increasingly-important on modern processors).
https://github.com/vasi/pixz
It’s packaged in Debian, and I assume other Linux distros.
Only downside is that GNU tar doesn’t have a single-letter shortcut to use pixz as a compressor, the way it does “z” for gzip, “j” for bzip2, or “J” for xz (LZMA); gotta use the more-verbose “-Ipixz”.
Also, while I don’t recommend it, IIRC gzip has a limited range that the effects of compression can propagate, and so even if you aren’t intentionally trying to provide random access, there is software that leverages this to hack in random access as well. I don’t recall whether someone has rigged it up with tar and indexing, but I suppose if someone were specifically determined to use gzip, one could go that route.
wait so does it do all of those things?
So there’s a tool called tar that creates an archive (a
.tar
file. Then theres a tool called zstd that can be used to compress files, including.tar
files, which then becomes a.tar.zst
file. And then you can encrypt your.tar.zst
file using a tool called gpg, which would leave you with an encrypted, compressed.tar.zst.gpg
archive.Now, most people aren’t doing everything in the terminal, so the process for most people would be pretty much the same as creating a ZIP archive.
The problem here being that GnuPG does nothing really well.
AV1 is also much younger than H264 (AV1 is a specification, x264 is an implementation), and only recently have software-encoders become somewhat viable; a more apt comparison would have been AV1 to HEVC, though the latter is also somewhat old nowadays but still a competitive codec. Unfortunately currently there aren’t many options to use AV1 in a very meaningful way; you can encode your own media with it, but that’s about it; you can stream to YouTube, but YouTube will recode to another codec.
Could you elaborate? I’ve never had any issues with gpg before and curious what people are having issues with.
AV1 has almost full browser support (iirc) and companies like YouTube, Netflix, and Meta have started moving over to AV1 from VP9 (since AV1 is the successor to VP9). But you’re right, it’s still working on adoption, but this is moreso just my dreamworld than it is a prediction for future standardization.
This article and the blog post linked within it summarize it very well.
Okay, provide me with an open standard that is widely-used that provides similar functionality.
It isn’t there. There are parties who would like to move email users into their own little proprietary walled gardens, but not a replacement for email.
The guy is literally saying that encrypting email is unacceptable because it hasn’t been built from the ground up to support encryption.
I mean, the PGP guys added PGP to an existing system because otherwise nobody would use their nifty new system. Hell, it’s hard enough to get people to use PGP as it is. Saying “well, if everyone in the world just adopted a similar-but-new system that is more-amenable to encryption, that would be helpful”, sure, but people aren’t going to do that.
The message to be taken from here is rather “don’t bother”, if you need secure communication use something else, if you’re just using it so that Google can’t read your mail it might be ok but don’t expect this solution to be secure or anything. It’s security theater for the reasons listed, but the threat model for some people is a powerful adversary who can spend millions on software to find something against you in your communication and controls at least a significant portion of the infrastructure your data travels through. Think about whistleblowers in oppressive regimes, it’s absolutely crucial there that no information at all leaks. There’s just no way to safely rely on mail + PGP for secure communication there, and if you’re fine with your secrets leaking at one point or another, you didn’t really need that felt security in the first place. But then again, you’re just doing what the blog calls LARPing in the first place.
Super interesting stuff! Thank you for sharing.
No surprise, since OOXML is barely even a standard.
is av1 lossy
AV1 can do lossy video as well as lossless video.
I get better compression ratio with xz than zstd, both at highest. When building an Ubuntu squashFS
Zstd is way faster though
wait im confusrd whats the differenc ebetween .tar.zst and .tar.xz
Different ways of compressing the initial
.tar
archive.Having “double” extensions is a terrible convention for operating systems where extensions actually matter and users are used to them, like Windows.
“.tar.xz” should be something like “.tarxz” or “.txz”
But it’s not a tarxz, it’s an xz containing a tar, and you perform operations from right to left until you arrive back at the original files with whatever extensions they use.
If I compress an exe into a zip, would you expect that to be an exezip? No, you expect it to be file.exe.zip, informing you(and your system) that this file should first be unzipped, and then should be executed.
So what? When you zip 5 documents together do you name it .zip or .config.lib.sh.deb.zip?
Double extensions are not conventional on Windows, so no, I do not.
Dots in filenames are commonly used in any operating system like name_version.2.4.5.exe or similar… So I don’t see a problem.
deleted by creator
Sounds like a Windows problem
Cool. So it means it’s a problem for over 70% of all active desktop and laptop computers.
I get the frustration, but Windows is the one that strayed from convention/standard.
Also, i should’ve asked this earlier, but doesn’t Windows also only look at the characters following the last dot in the filename when determining the file type? If so, then this should be fine for Windows, since there’s only one canonical file extension at a time, right?
You’re absolutely correct when it comes to how Windows will interpret the file - it will ignore all the preceding “extensions” and will use the last one as the filetype and as the hook for whatever default action or application should handle it. However, getting people used to double extensions is one quick way of increasing the success rate of attacks such as the infamous “.pdf.exe” invoice from an email attachment. It also creates issues with renaming files and, though admittedly not many, some Windows application’s own file pickers.
Still - from just a theoretical point of view, I can’t see how Windows’ convention is worse, in fact, it makes significantly more sense. If I zip a file, it doesn’t matter what it was in a previous life, it’s now a zip - this is also how Unix deals with many filetypes, I’ve never seen a .h264.mp4 file, even though the .mp4 container can actually represent different types of encoding. Why have one filetype use the Windows convention and another, for no reason, a different one?
Very good point. Though, i would argue that this would be much less of a problem if Windows stopped sometimes hiding file extensions.
I don’t believe what you’re referring to is really a Windows versus Linux/Unix thing.
I disagree, but i do get what you’re saying here. I don’t think that example really works though, because a
.mp4
file isn’t derived from a.h264
file. A.mp4
is a container that may include h264-encoded video, but it may also have a channel with Opus-encoded audio or something. It’s apples and oranges.Also, even though there shouldn’t be any technical issues with this on Windows, you can still use a typical short filename suffix if you wish, though i would argue that using the long filename suffix is more expressive. From “tar (computing)” on Wikipedia:
I get your point. Since a
.tar.zst
file can be handled natively bytar
, using.tzst
instead does make sense.use a real operative system then
There already are conventional abbreviations: see Section 2.1. I doubt they will be better supported by tools though.
That’s much better. Thanks for actually answering the comment, rather than the usual “Windows bad, Linux good, upvotes please”
I would argue what windows does with the extensions is a bad idea. Why do you think engineers should do things in favour of these horrible decisions the most insecure OS is designed with?
Damn didn’t realize that JXL was such a big deal. That whole JPEG recompression actually seems pretty damn cool as well. There was some noise about GNOME starting to make use of JXL in their ecosystem too…