How to anonymize logs before sharing?

ouch@lemmy.world · 4 months ago

How to anonymize logs before sharing?

QuazarOmega@lemy.lol · 4 months ago

A tool would actually be so good to have, it’s such a common thing that we don’t even think about it much. You sparked my curiosity so I tried to search if there was one and it seems there is a project out there: loganon, though it’s long dead unfortunately

[email protected]@sh.itjust.works · 4 months ago

The problem is there’s likely not a universal solution that’s guaranteed to clean everything in every case.

Cleaning specific logs/configs is much easier when you know what you’re dealing with.
Something like anonymizing a Cisco router config is easy enough because it folllows a known format that you can parse and clean.
Building a tool to anonymize some random logs from a specific software is one thing, anonymizing all logs from any software is unlikely.
Either way, it should always be double-checked and tailored to what’s being logged.

QuazarOmega@lemy.lol · 4 months ago

I agree, besides basic patterns to search for, that will most likely be necessary. In fact looking a bit more at this tool, it has a list of “rules” tailored to each software specifically, I guess this could be sustainable really only if a repository of third party extensions was kept so that anyone could contribute and the pool of rules expanded progressively

subtext@lemmy.world · 4 months ago

I wonder if you could do something with heuristics or a micro LLM to flag words that might be expected to be private.

I would be curious if someone could do a proof of concept with the Ollama self-hosted model. Like if you feed it with examples of names, IP addresses, API-key-like-strings, and others, it might be able to read through the whole file and then flag anything with a risk level greater than some threshold.

EuroNutellaMan@lemmy.world · edit-2 4 months ago

vim log
:%s/yourusername/anonusername/g
:%s/yourip/xxx.xxx.xxx.xxx/g
:wq