datahoarder
-
How are data storage prices going to change in the next couple years?
I was considering making a 30+ TB NAS to simplify and streamline my current setup but because it's a relatively low priority for me I am wondering is it worth it to hold off for a year or two?
I am unsure if prices have more or less plateaued and the difference won't be all that substantial. Maybe I should just wait for Black Friday.
For context it seems like two 16TB HDD would cost about $320 currently.
---
Here's some related links:
-
Vimms Lair is getting removal notices from Nintendo etc. We need someone to help make a rom pack archive can you help?
cross-posted from: https://slrpnk.net/post/10273849
> Vimms Lair is getting removal notices from Nintendo etc. We need someone to help make a rom pack archive can you help? > > Vimms lair is starting to remove many roms that are being requested to be removed by Nintendo etc. soon many original roms, hacks, and translations will be lost forever. Can any of you help make archive torrents of roms from vimms lair and cdromance? They have hacks and translations that dont exist elsewhere and will probably be removed soon with ios emulation and retro handhelds bringing so much attention to roms and these sites
-
PGSub - A Giant Archive of Subtitles For Everyone
gitlab.com starshiners / PGSub · GitLabLarge Postgres database of collected subtitles with companion apps to access them.
I've been working on this subtitle archive project for some time. It is a Postgres database along with a CLI and API application allowing you to easily extract the subs you want. It is primarily intended for encoders or people with large libraries, but anyone can use it!
PGSub is composed from three dumps:
- opensubtitles.org.Actually.Open.Edition.2022.07.25
- Subscene V2 (prior to shutdown)
- Gnome's Hut of Subs (as of 2024-04)
As such, it is a good resource for films and series up to around 2022.
Some stats (copied from README):
- Out of 9,503,730 files originally obtained from dumps, 9,500,355 (99.96%) were inserted into the database.
- Out of the 9,500,355 inserted, 8,389,369 (88.31%) are matched with a film or series.
- There are 154,737 unique films or series represented, though note the lines get a bit hazy when considering TV movies, specials, and so forth. 133,780 are films, 20,957 are series.
- 93 languages are represented, with a special '00' language indicating a .mks file with multiple languages present.
- 55% of matched items have a FPS value present.
Once imported, the recommended way to access it is via the CLI application. The CLI and API can be compiled on Windows and Linux (and maybe Mac), and there also pre-built binaries available.
The database dump is distributed via torrent (if it doesn't work for you, let me know), which you can find in the repo. It is ~243 GiB compressed, and uses a little under 300 GiB of table space once imported.
For a limited time I will devote some resources to bug-fixing the applications, or perhaps adding some small QoL improvements. But, of course, you can always fork them or make or own if they don't suit you.
- www.theregister.com Blame the SSD price hike on enterprise demand for AI servers
Samsung, SK, Kioxia and others make bank, booking double digit bounces in Q1 revenue
-
What are you average file sizes for movies and series?
I'm looking at my library and I'm wondering if I should process some of it to reduce the size of some files.
There are some movies in 720p that are 1.6~1.9GB each. And then there are some at the same resolution but are 2.5GB. I even have some in 1080p which are just 2GB. I only have two movies in 4k, one is 3.4GB and the other is 36.2GB (can't really tell the detail difference since I don't have 4k displays)
And then there's an anime I have twice at the same resolution, one set of files are around 669~671MB, the other set 191 each (although in this the quality is kind of noticeable while playing them, as opposed to the other files I extract some frames)
What would you do? what's your target size for movies and series? What bitrate do you go for in which codec?
Not sure if it's kind of blasphemy in here talking about trying to compromise quality for size, hehe, but I don't know where to ask this. I was planning on using these settings in ffmpeg, what do you think? I tried it in an anime at 1080p, from 670MB to 570MB, and I wasn't able to tell the difference in quality extracting a frame form the input and the output.
ffmpeg -y -threads 4 -init_hw_device cuda=cu:0 -filter_hw_device cu -hwaccel cuda -i './01.mp4' -c:v h264_nvenc -preset:v p7 -profile:v main -level:v 4.0 -vf "hwupload_cuda,scale_cuda=format=yuv420p" -rc:v vbr -cq:v 26 -rc-lookahead:v 32 -b:v 0
-
Looking for emotional support: I lost all my WhatsApp chats
I was so confident that WhatsApp was backing itself up to Google ever since I got my new pixel but I just wasn't. Then yesterday I factory reset my phone to fix something else and I lost it all. Years worth of chats from so many times in my past just aren't there, all my texts with my mom and my family, group chats with old friends... I can't even look at the app anymore, I'll never use Whatsapp as much as I used to. I just don't feel right with this change. There's no way to get those chats back and now it doesn't feel like there's any point backing up WhatsApp now! I really wanna cry like this is so unfair!! And all I had to do was check Whatsapp before I did a factory reset.. the TINIEST THING I could have done and prevented this and I didn't fucking do it!!!!!!!
How do I get past this?
-
Not Dead Yet: WD Releases New 6TB 2.5-Inch External Hard Drives - First Upgrade in Seven Years
www.anandtech.com After 7-Year Hiatus, Western Digital Unveils 6TB 2.5-Inch Hard DrivesAfter all this time, the company has added an additional terabyte to its external drives.
-
Looking for some advice on moving 100TBs of data from the cloud to tape
With Google Workspace cracking down on storage (Been using them for unlimited storage for years now) I was lucky to get a limit of 300TBs, but now I have to actually watch what gets stored lol
A good portion is uh "Linux ISOs", but the rest is very seldom (In many cases last access was years ago) accessed files that I think would be perfect for tape archival. Things like byte-to-byte drive images and old backups. I figure these would be a good candidate for tape and estimate this portion would be about 100TBs or more
But I've never done tape before, so I'm looking for some purchasing advice and such. I seen from some of my research that I should target picking up an LTO8 drive as it's compatible with LTO9 for when they come down in price.
And then it spiraled from there with discussions on library tape drives that are cheaper but need modifications and all sorts of things
-
How to download Google Docs version history (easy way, doesn't include how to use the downloaded file)
Run this javascript code with the document open in the browser: https://codeberg.org/dullbananas/google-docs-revisions-downloader/src/branch/main/googleDocsRevisionDownloader.js
Usually this is possible by pasting it into the Console tab in developer tools. If running javascript is not an option, then use this method: https://lemmy.ca/post/21276143
You might need to manually remove the characters before the first
{
in the downloaded file. -
How to download Google Docs version history (hard way, doesn't include how to use the downloaded file)
- Copy the document ID. For example, if the URL is
https://docs.google.com/document/d/16Asz8elLzwppfEhuBWg6-Ckw-Xtfgmh6JixYrKZa8Uw/edit
, then the ID is16Asz8elLzwppfEhuBWg6-Ckw-Xtfgmh6JixYrKZa8Uw
. - Open this URL:
https://docs.google.com/document/u/1/d/poop/revisions/load?id=poop&start=1&end=1
(replacepoop
with the ID from the previous step). You should see a json file. - Add
0
to the end of the number afterend=
and refresh. Repeat until you see an error page instead of a json file. - Find the highest number that makes a json file instead of an error page appear. This involves repeatedly trying a number between the highest number known to result in a json file and the lowest number known to result in an error page.
- Download the json file. You might need to remove the characters before the first
{
.
I found the URL format for step 2 here:
https://features.jsomers.net/how-i-reverse-engineered-google-docs/
I am working on an easy way. Edit: here it is https://lemmy.ca/post/21281709
- Copy the document ID. For example, if the URL is
- qz.com DVDs, Blu-rays, and VHS tapes: How long does physical media last?
The exact lifespan of physical media is hard to pin down
-
My name is lars and I’m a hoarder. Too.
cross-posted from: https://programming.dev/post/13631943
> Firefox Power User Keeps 7,400+ Browser Tabs Open for 2 Years
-
[Request] Any Guides to FFMPEG, Transcoding, Codecs, and Metadata?
cross-posted from: https://leminal.space/post/6179210
> I have a collection of about ~110 4K Blu-Ray movies that I've ripped and I want to take the time to compress and store them for use on a future Jellyfin server. > > I know some very basics about
ffmpeg
and general codec information, but I have a very specific set of goals in mind I'm hoping someone could point me in the right direction with: > > 1. Smaller file size (obviously) > 2. Image quality good enough that I cannot spot the difference, even on a high-end TV or projector > 3. Preserved audio > 4. Preserved HDR metadata > > In a perfect world, I would love to be able to convert the proprietary HDR into an open standard, and the Dolby Atmos audio into an open standard, but a good compromise is this. > > Assuming that I have the hardware necessary to do the initial encoding, and my server will be powerful enough for transcoding in that format, any tips or pointers? -
Question about price per TB
So it's been a a few years since I've bought hard drives for my little home server and wanted to get a bead on what's the target on dollar to TB in the post Covid world. Thanks!
- www.eenewseurope.com Samsung boosts 1Tb V-NAND bit density by 50 percent
Groundbreaking double-stack V-NAND cell structure maximizes fabrication productivity through advanced ‘channel hole etching’ technology. %
-
Looking for all the three books in the Moving Caste series
I'm looking for EPUB formats of all the three books.
-
Effectively manage and store books/comics/manga
Hello, I'm wondering what do you guys use and recommend for efficient book, comic, manga and lightnovel file management, tagging, directory structures and automated tools for all that.
My collection is mostly made from humble bundle book bundles, for getting tags into comics I use comictagger and as for file structure, it was mostly just me just putting something to separate the books.
I wan't to hear you guys input because most of you are a lot more efficient or have a lot more experience in saving big ammounts of data, and I wan't to make my process as painless and future proof as possible as my collection starts to grow.
Edit: I use linux so software like comicrack which I heard a lot about isn't really accessible to me. The files also need to be accessible to my kavita server.
-
How to store digital files for posterity? (hundreds of years)
How to store digital files for posterity? (hundreds of years)
I have some family videos and audios and I want to physically save them for posterity so that it lasts for periods like 200 years and more. This allows great-grandchildren and great-great-grandchildren to have access.
From the research I did, I found that the longest-lasting way to physically store digital content is through CD-R gold discs, but it may only last 100 years. From what I researched, the average lifespan of HDs and SSDs is no more than 10 years.
I came to the conclusion that the only way to ensure that the files really pass from generation to generation is to record them on CDs and distribute them to the family, asking them to make copies from time to time.
It's crazy to think that if there were suddenly a mass extinction of the human species, intelligent beings arriving on Earth in 1000 years would probably not be able to access our digital content. While cave paintings would probably remain in the same place.
What is your opinion?
-
Whats do you use in your yt-dlp config for the best downloads?
I formatted my PC recently and I'm reinstalling some stuff I forgot to make a backup of settings like yt-dlp so I started searching for what is a good config to download the best mp4 quality and found some interesting setups so I figured I'd make a thread for people to share what they use.
Here's the most best setup I found so far which downloads a 1080p mp4 with the filename and includes metadata, english subtitles, and chapters if available:
yt-dlp -f 'bestvideo[height<=1080][ext=mp4]+bestaudio[ext=m4a]/best[ext=mp4]/best' -S vcodec:h264 --windows-filenames --restrict-filenames --write-auto-subs --sub-lang "en.*" --embed-subs --add-metadata --add-chapters --no-playlist -N 4 -ci --verbose --remux-video "mp4/mkv" URL
Ideally it would also mark the sponserblock section and download to a specified folder
-
The Internet Archive Just Backed Up an Entire Caribbean Island
@datahoarder The Internet Archive Just Backed Up an Entire Caribbean Island https://www.wired.com/story/internet-archive-backed-up-aruba-caribbean-island/
-
SAS hardware questions
Hi, anyone could point me at an ELI5 about SAS hardware? I'd like to assemble a NAS using an old HP Z200, I want SAS because I'd get also a tape drive for backups and I cannot find SATA tape drives. For example, is a Dell Perc H310 pci-e card good for me? Can I avoid hardware RAID?
-
Archiving Lemmy instances
While clicking through some random Lemmy instances, I found one that's due to be shut down in about a week — https://dmv.social. I'm trying to archive what I can onto the Wayback Machine, but I'm not sure what the most efficient way to go about it is.
At the moment, what I've been doing is going through each community and archiving each sort type (except the ones under a month, since the instance was locked a month ago) with capture outlinks enabled. But is there a more efficient way to do it? I know of the Internet Archives save from spreadsheet tool, which would probably work well, but I don't know how I'd go about crawling all the links into a sitemap or csv or something similar. I don't have the know-how to setup a web crawler/spider.
Any suggestions?
-
SSD hides contents after a few days
Seems the SSD sometimes heats up and the content disappears from the device, mostly from my router, sometimes from my laptop. Do you know what I should configure to put the drive to sleep or something similar to reduce the heat?
I'm starting up my datahoarder journey now that I replaced my internal nvme SSD.
It's just a 500GB one which I attached to my d-link router running openwrt. I configured it with samba and everything worked fine when I finished the setup. I just have some media files in there, so I read the data from jellyfin.
After a few days the content disappears, it's not a connection problem from the shared drive, since I ssh into the router and the files aren't shown. I need to physically remove the drive and connect it again. When I do this I notice the somewhat hot. Not scalding, just hot.
I also tried this connecting it directly to my laptop running ubuntu. In there the drive sometimes remains cool and the data shows up without issue after days. But sometimes it also heats up and the data disappears (this was even when the data was not being used, i.e. I didn't configure jellyfin to read from the drive)
I'm not sure how I can be sure to let the ssd sleep for periods of time or to throttle it so it can cool off. Any suggestion?
-
What's your most treasured data?
What in your hoard do you treasure the most? I imagine to a lot of us it is photos and videos of our families, which I'd love to hear about, but also interested in rare bits of media or information that makes your collection unique.
-
Does anyone have an archive of the @ErmnMusk Twitter account?
Last year Elon Musk accidentally revealed that he has a burner account on Twitter called @ErmnMusk.
Now that account is gone.
I'm looking for an archive: its tweets, its likes, anything and everything. Does anyone know where to find?
-
A Data Hoarder, I Am Not
So I’ve been consolidating all of my storage and removing all the duplicates and junk files.
In actual physical storage, this was spread across 12TB worth of hard drives, all partially full.
After everything was said and done, I’m using 1.3TB of space if you don’t include games. ¯\\\(ツ)/¯
This is stuff dating back to 2015. Sometimes it’s actually worth it to just clean up your junk files.
- spectrum.ieee.org DVD’s New Cousin Can Store More Than a Petabit
Containing more data than the entire Internet transmits in a second
- www.rockpapershotgun.com Get Intel's legendary Optane 905P 1.5TB SSD at a discount in the US
Intel's Optane drives are legendary in PC tech circles, offering random performance and low latency that remains unmatc…
-
[ExplainingComputers] Explaining File Compression Formats
I imagine a lot of people who are into data hoarding already know a lot of this but I thought the video was pretty neat. It briefly talks about the history of different compression formats and provides a brief blurb about why you may want to use one or the other.
I'd recommend checking it out if you want 15 minutes of background noise.
---
For anyone new to data compression TechQuickie and CrashCourse have videos on it. If you really want to go down the rabbit hole you could check out media compression and see how things like JPEGs and PNGs work.
- www.hackster.io 3d Printz's Case Turns a Raspberry Pi Into a Neat, 10TB-Capacity Network-Attached Storage Appliance
With friction-fit mounts for USB hard drives, a cooling fan, and a dedicated stats display, this compact NAS packs in the features.
-
Get URLs for all tweets within date range?
cross-posted from: https://sh.itjust.works/post/14280067
> What is the best tool to get URLs for all tweets within a given date range? > > The ideal behaviour I'm looking for would be something like this: > > Input:
https://twitter.com/SpaceX 2023-09-01 2024-02-08
> > Output: > > -https://twitter.com/SpaceX/status/1755763378449183003#m
> -https://twitter.com/SpaceX/status/1755759459765567825#m
> -https://twitter.com/SpaceX/status/1755752291578302545#m
> -...
> > What would be the best tool to achieve this? Thanks in advance! -
Best solution for a distributed filesystem?
Not sure if this is better fit for datahoarder or some selfhost community, but putting my money on this one.
The problem
I currently have a cute little server with two drives connected to it running a few different services (mostly media serving and torrents). The key facts here is that 1) it's cute and little, 2) it's handling pretty bulky data. Cute and little doesn't go very well with big raid setups and such, and apart from upgrading one of the drives I'm probably at my limit in terms of how much storage I can physically fit in the machine. Also if I want to reinstall it or something that's very difficult to do without downtime since I'd have to move the drive and services of to a different machine (not a huge problem since I'm the only one using it, but I don't like it).
Solution
A distributed FS would definitely solve the issue of physically fitting more drives into the chassi, since I could basically just connect drives to a raspberry pi and have this raspi join the distributed fs. Great.
I think it could also solve the issue of potential downtime if I reinstall or do maintenance, since I can have multiple services read of the same distributed FS and reroute my reverse proxy to use the new services while the old ones are taken offline. There will potentially be a disruption, but no downtime.
Candidates
I know there are many different solutions for distributed filesystems, such as ceph, moosefs, glusterfs and miniio. I'm kinda leaning towards ceph because of it's integration in proxmox, but it also seems like the most complicated solution in the bunch. Is it worth it? What are your experiences with these, and given the above description of my use-case which do you think would be the best fit?
Since I already have a lot of data it's a bonus if it's easy to migrate from my current filesystem somehow.
My current setup uses a lot of hard links as well, so it's a big bonus if the solution has something similar (i.e. some easy way of storing the same data in multiple places without duplicating it)
-
[Help] [Seeding] DDoS secrets, responsible for hosting several leaks, will stop its activities.
cross-posted from: https://lemmy.dbzer0.com/post/13532369
DDoS secrets responsible for hosting leaks such as EpikFail and BlueLeaks will stop its activities, I would like help from anyone who has space left so we can download everything and keep seeding.
Torrent download links: https://data.ddosecrets.com/
-
Changing Video Encoding Properly
I'm going to archive some youtube videos, what's the proper way to change it from mp4 to webm to etc etc or vice versa?
In the past when I couldn't run a video file for whatever reason I would just rename the file, but I'm assuming there's better ways to do it. And is there a specific order I have to go in? (e.g. with audio going from .mp3 to .flac doesn't make sense.)
Thanks in advance.
-
scraped media links from instagram and threads
I have scraped a lot of links from instagram and threads using selenium python. It was a good learning experience. I will be running that script for few days more and will see how many more media links I can scrape from instagram and threads.
However, the problem is that the media isn't tagged so we don't know what type of media it is. I wonder if there is an AI or something that can categorize this random media links to an organized list.
if you want to download all the media from the links you can run the following command: ```bash
This command will download file with all the links
wget -O links.txt https://gist.githubusercontent.com/Ghodawalaaman/f331d95550f64afac67a6b2a68903bf7/raw/7cc4cc57cdf5ab8aef6471c9407585315ca9d628/gistfile1.txt
This command will actually download the media from the links file we got from the above command
wget -i links1.txt ``` I was thinking about storing all of these. there is two ways of storing these. the first one is to just store the links.txt file and download the content when needed or we can download the content from the links save it to a hard drive. the second method will consume more space, so the first method is good imo.
I hope it was something you like :)
- lemmy.world Hobbes OS/2 Archive to shut down in three months – OSnews - Lemmy.World
The announcement on https://hobbes.nmsu.edu/ [https://hobbes.nmsu.edu/] ATTENTION After many years of service, hobbes.nmsu.edu [http://hobbes.nmsu.edu] will be decommissioned and will no longer be available. You the user are responsible for downloading any of the files found in this archive if you w...