I think the community is much more important than just having more content. I would worry that by flooding Lemmy with Reddit's content without the community to support that content could drown everyone out.
Yeah I agree. I think being able to tweak a personalised scraper as an opt in service per community could work. Some subs just won't work for this, like askreddit and Eli5, but niche communities, news communities and information communities. I would be happy just seeing the top 2 or 3 posts of the day for some, and for world news for example, only the posts that pass a certain threshold of upvotes in a certain time or against the subs size, to make sure the real news pops up quick.
Nah. This is a fresh start. It's been less than a week and theres already so much more content. It'll grow soon enough. Especially with spez fucking around over there. We want og content here, not all the shit reposts that already plague reddit
If I wanted to lurk Reddit, I would just do that. Better to stay your separate thing. It is nice to sometimes get news about the other side of the fence.
If you wanna write code to do this ... I'd say skip the bot, write a gateway instead.
Back in the early days of email, there were lots of different email systems, not just the SMTP Internet email we use today. There was UUCP email with "bang paths", where your email address specified a list of servers that a message could be passed through to get to you. There were other networks like FidoNet and WWIVnet, that could send email to Internet email addresses through special "gateway" servers.
A gateway receives messages using one protocol or service, and retransmits or makes them available on another protocol or service.
For a little while in 1992, I had access to read Usenet posts only through a gateway that exported Usenet posts onto the Gopher system.
A gateway between Reddit and Lemmy would appear to Reddit as a web browser, scraping posts and comments; while appearing to Lemmy as a Lemmy instance that users could subscribe to, making each subreddit it scrapes available as a Lemmy community.
So a Lemmy user could subscribe to, say, [email protected] and see a fresh view of AskReddit. The server at reddittolemmy.com would not be a standard Lemmy server with users, but rather a custom gateway server that fetches data from Reddit and makes it available in the form of a Lemmy community.
(If Reddit were not being an asshole, a gateway could be an API client. But Reddit is being an asshole, so a gateway should probably be written as a scraper that accesses Reddit as if it were a normal user using a desktop Web browser.)
I don't particularly think the whole of Reddit needs to be scraped though. I could be happy with only scraping posts that pass a certain thresh hold of votes against the subreddits subscriber count and maybe getting those crossposted to the Lemmy equivalent communities that want to opt in to such a service. This would be especially useful for World News and the more niche subreddits that don't yet have a big enough userbase here
You could dedicate a community to reddit reposts easy enough. If people want to see the stuff from reddit they can sub, if they would rather wash their hands of reddit they can ignore it or block it.
Thoughts:
Content is good atm.
If it's spammy it'll get blocked. Or the instance de-federated.
Is going to Reddit boosting reddit or draining it by using the content.
Reddit has a lot of bad content that Lemmy is free of.
Good question! I dont really know how I'd feel about this.
On the one hand, for me to be ok with it it needs to be CLEAR that its a bot reposting reddit content, maybe even if its limited to specific communities for the sake of archiving.
On the other, I do want lemmy to be distinct and I am curious to see where it will end up. I also feel like we shouldn't be trying to emulate reddit exactly, as that would mean we also get the crappy parts as well. Also, that would be limiting. Who knows what lemmy can become? Why limit it to being reddit 2.0, you know?
I think reposting content from reddit is fine, but we don't need a bot for it. I'd rather see individuals bring over specific posts they think are notable as opposed to automatically copying everything they've got.
I think there's value to it. There's a ton of really important and helpful documentation on there for all kinds of fields, in the past couple days I've noticed some references I've had for tech issues and guitar repair are no longer accessible, for example. There's going to be a period where looking for help/answers online for certain issues is going to be a nightmare.
Yes, content is important, but it's not about the amount of content, but the quality of the content and discussion. Imagine every day you come to Lemmy and your front page is flooded with 0 upvote posts with no discussion from Reddit. You'd probably leave right away.
Manually cross-posting content that you like is probably better, since you're more likely to choose high-quality relevant content, and (I'm assuming) you'd be rate-limited to an appropriate amount.
Some people set things up like this on Mastodon, bots that reposted things from Twitter. Personally, I don't love them.
When everyone left Digg and got on Reddit back in the day, no one set up bots to automatically repost things from Digg to Reddit... it wouldn't even have been helpful, because people just stopped posting nearly as much stuff on Digg.
I don't like those bots either and make it a point to not follow them. Id say if the OP can't see the people trying to interact with the post, what's the point?
I am a veteran of the Digg migration, the world is very very different then it was back then. For one thing, Reddit was already an established alternative, just with a lower user base then Digg. Lemmy feels much more like building a community completely from scratch.
I think it could be great for some subs, that are very dependent on many users, because It's the rare users that provide content. Subs like AITA or TIFU, where you can react and discuss inside the community, but get content from outside.
Ideally tho, they would just become self sustainable
I think many would be more interested in a migration tool. A way of porting a subreddit's worth of content to lemmy and start off strong, as well as preserve what might be several subreddits getting nuked as damage control.
A few comments, some of which have been touched on by others:
One of the reasons Reddit went to the new pricing was to prevent LLMs from scraping their content and using it for free. I don't think a bot like this would work for the same reason.
I personally don't want a duplicate of Reddit. I'd prefer this to be a new community that can do better.
One of the things that drove me away from Reddit (besides their horrible handling of 3PAs) was all the bot posts. If there were a way to make sure that bot accounts were identified as such, that would be perfect. Hell, I'd rather have zero bot posts than the current state of Reddit.
I think it should be left up to the mods of individual communities, maybe even better, keep it instance specific and other instances can de-federate if they feel it's a problem.
It'd be a good idea to create an instance with different communities as someone said here for the kind of posts that people could contribute, no matter if reposts are made by people or by bots (in this case, posts need to be filtered by upvotes or upvotes ratio or manually selected), for example some posts I was referring to with my first comment were about some guides or wikis for certain apps, etc. Some important or interesting knowledge that won't be here and that would make people like me forced to rely on Reddit.
That's why I said just get the content of those posts and just give credits to the OP without just copying a link to redirect people to Reddit.
In fact, I wouldn't want this to be filled with Reddit spam posts as most of them are useless.
As Reddit turns into a trash, the moderation and content quality will drop. Import the content may seem interesting at first, but in the long run it won't worth the effort.
Currently there are a lot of bots on Mastodon doing this. Since we're still waiting on a lot of groups etc to get on board with at least replicating their own content. I don't think this would be any more objectionable. Though I think the admins and owners of Reddit are spiteful and petty enough that they would go after it anyway they could.
I subscribed [email protected], which uses a bot to repost from various places. But it was overwhelming to get every hackernews post added and I unsubscribed. Reposting everything from a big link aggregator is too much. But might be okay if just duplicating a smaller subreddit.
Then there is the problem that Reddit will likely fight your efforts to scrape it without paying api fees.
I'm not entirely against the idea, but also really don't want to turn the fediverse into the flood of content that is reddit. Maybe some kind of relative upvote filter based bot. Only posts that get some certain percentage of above average number for a given sub, get scraped.
I've been toying with this idea at well. I don't think it's a good idea to scrape all content. This could drown out the lemmy-original content, especially when large subreddits are concerned. Maybe an upvote threshold (Only scrape if more than X upvotes) would be a good idea.
I would also scrape only the post itself, not the comments. Best to have our own organic discussions here.
Finally it should be very clear that a bot is posting these things. Ideally the bot would also ensure it is not re-posting something that was already posted by a Lemmy users just a bit earlier.
It'd certainly help grow Lemmy by helping with the migration (at least for me, as I don't want to go back to Reddit, but there are subs there with interesting info for me that won't migrate), but it shouldn't contain any direct links to Reddit, just credits to the OP.