Skip Navigation

Does Lemmy really benefit from Rust? Is code execution speed the bottleneck?

My first experience with Lemmy was thinking that the UI was beautiful, and lemmy.ml (the first instance I looked at) was asking people not to join because they already had 1500 users and were struggling to scale.

1500 users just doesn't seem like much, it seems like the type of load you could handle with a Raspberry Pi in a dusty corner.

Are the Lemmy servers struggling to scale because of the federation process / protocols?

Maybe I underestimate how much compute goes into hosting user generated content? Users generate very little text, but uploading pictures takes more space. Users are generating millions of bytes of content and it's overloading computers that can handle billions of bytes with ease, what happened? Am I missing something here?

Or maybe the code is just inefficient?

Which brings me to the title's question: Does Lemmy benefit from using Rust? None of the problems I can imagine are related to code execution speed.

If the federation process and protocols are inefficient, then everything is being built on sand. Popular protocols are hard to change. How often does the HTTP protocol change? Never. The language used for the code doesn't matter in this case.

If the code is just inefficient, well, inefficient Rust is probably slower than efficient Python or JavaScript. Could the complexity of Rust have pushed the devs towards a simpler but less efficient solution that ends up being slower than garbage collected languages? I'm sure this has happened before, but I don't know anything about the Lemmy code.

Or, again, maybe I'm just underestimating the amount of compute required to support 1500 users sharing a little bit of text and a few images?

150

You're viewing a single thread.

150 comments
  • I agree, hearing about scaling issues so early into adoption is concerning. Lemmy advocates say "horizontal scaling is already built in! just add more instances!", but that doesn't explain the problem.

    It's all just text! By my guess too, handling text alone a server should easily support a thousand concurrent users, and hundreds of thousands of daily users. A RasPI should handle thousands. I've heard the bottleneck is the database? In that case Rust is not to blame, Postgres is.

    But my fear is that the data structures are implemented in a trivial way. If you have a good reddit-sized thread with a thousand comments, but you store each comment as a separate database entry, then every pageview will trigger a thousand database lookups! The way I imagined making a reddit clone is that I would store the comments as a flat list with some tree data on top, such that serving a single page with 1000 comments is no different that streaming a 100K text file. I'll go take a look how Lemmy does it currently once I get the courage!

    • If you have a good reddit-sized thread with a thousand comments, but you store each comment as a separate database entry, then every pageview will trigger a thousand database lookups!

      No it wouldn't, that's called the N+1 query problem and it can be avoided by writing more efficient queries

      • Could you explain more how this works? I see how you should be reducing the number of SQL queries from N+1:

        SELECT p.comment_ids FROM posts p WHERE p.post_id = 79
        -> (5, 13, 42, 57)
        SELECT c.text FROM comments c WHERE c.comment_id = 5
        SELECT c.text FROM comments c WHERE c.comment_id = 13
        SELECT c.text FROM comments c WHERE c.comment_id = 42
        SELECT c.text FROM comments c WHERE c.comment_id = 57
        

        down to 1 query:

        SELECT c.text FROM comments c WHERE c.parent_post = 79
        

        (Or something like this, I don't know SQL sorry). But wouldn't the database still have to lookup each comment line record on the backend? Yes, they are all indexed and hashed, but if you have a thousand comments, or even ten thousand (that reddit handles perfectly fine!) - isn't 10000 fetches from a hashtable still slower than fetching a 10000-long array? And what if you've been running your reddit clone for years and accumulated so many gigs of content that they don't fit in memory and have to be stored on disk. Aren't you looking at 10000 disk reads in worst case scenario with a hashtable?

        • You've got the right idea with your SQL example, that's pretty much exactly what N+1 would look like in your query logs.

          This can happen when using an ORM, if you're not careful to avoid it. Many ORMs will query the database on attribute access, in a way that is not particularly obvious:

          
          class User:
            id: int
            username: str
          
          class Post:
            id: int
          
          class Comment:
            id: int
            post_id: int  # FK to Post.id
            author_id: int  # FK to User
           
          

          Given this simple python-ish example, many ORMs will let you do something like this:

          
          post = Post.objects.get(id=11)
          
          for comment in post.comments:  # SELECT * FROM comment WHERE post_id=11
              author = comment.author  # uh oh! # SELECT * FROM user WHERE id=comment.author_id
          

          Although comment.author looks like a simple attribute access, the ORM has to issue a DB query behind the scenes. As a dev, especially one learning a new tool, it's not particularly obvious that this is happening, unless you've got some query logging that you're likely to notice during development.

          A couple of fixes are possible here. Some ORMs will provide some method for fetching the comments via JOIN in the initial query. e.g. post = Post.objects.get(id=11).select_related("comments") instead of just post = Post.objects.get(id=11). Alternately, you could fetch the Post, then do another query to grab all the comments. In this toy example, the former would almost certainly be faster, but in a more complex example where you're JOINing across multiple tables, you might try breaking the query up in different ways if you're really trying to squeeze out the last drop of performance.

          In general, DB query planners are very good at retrieving data efficiently, given a reasonable query + the presence of appropriate indexes.

        • "disk reads" are unavoidable. It's finding the data in the first place that's expensive. In an appropriately indexed database, reading a sequential range is extremely efficient. Rather than reading 10,000 times from a hash table, it's like reading a single table into memory, which is possible because you know in advance that the data you're looking for is there.

          Bear in mind that indexing a database can include the physical organization of the data on the disk. As a simplified example, if you choose a clustered index based on, timestamp, then selecting data between 2 timestamps is as easy as locating the endpoints and reading the data sequentially off the disk. (The reality is more technically complex, but doesn't involve much more physical work.)

        • @TauZero you would use a join so that one call would fetch all comment rows for that post:

          `SELECT p.post_id, p.title, p.description, c.text

          FROM posts p JOIN comments c ON p.post_id = c.post_id

          WHERE p.post_id = 79`

          This would return a list of all comments for post 79, along with the post id, title, and description also in every row. This isn't a perfect example of optimized SQL, and I have no idea if these are the correct table and field names, as I didn't look them up. But it's a simple example of joins.

    • But my fear is that the data structures are implemented in a trivial way. If you have a good reddit-sized thread with a thousand comments, but you store each comment as a separate database entry, then every pageview will trigger a thousand database lookups!

      I mean, I hope that surely can't be the case. There is no reason why they wouldn't get all of those comments in a single request even if it were separate rows in the db. I initially thought that they'd have to query each individual instance participating in a thread but even that isn't the case because the fediverse protocol makes each instance mirror the content from others, meaning that your local instance should already have the necessary data in one place, making it easy to load it to the users.

150 comments