@paninid It very much depends on just what caught your interest.
As far as text & old-school websites go, #wget still does an admirable job.
For #youtube and a number of other sites, #ytdlp (yt-dlp) is decent and its quality selectors & filtering are very helpful in making the most out of limited storage.
Text tends to compress very well, so a btrfs filesystem using zstd will be able to contain quite a lot.
Cloud storage is an option if you encrypt client-side, otherwise you risk deletion.
@paninid #Google among others has been caught deleting anything it deems copyright-infringing from private storage, as has Dropbox (https://old.reddit.com/r/DataHoarder/comments/v8danc/justin_roiland_cocreator_of_rick_and_morty/) even when you *own* the copyright.
As with most such stored data, backups are still essential and the 3-2-1 rule comes highly recommended.
That is, at least 3 backups, on at least 2 different media and at least 1 offsite.
Versioning & deduplicating backup solutions like #BorgBackup and #Duplicity make that easier to accomplish over time.
@paninid That media part used to refer to different storage media types but that aged relatively badly as digital storage media became both more limited in options *and* more reliable.
So two different hard-drives for example is fine. Or SSDs, assuming you connect them periodically so they don't suffer data-loss.
As always, using filesystems able to detect corruption like zfs & btrfs and *running* the scrubs necessary to ensure they do detect problems must be done.