![]() I made a serious point of saying that the presentation you linked didn't seem to back up the assertion and not that I felt the assertion was wrong, or that I didn't believe you had good reason to feel closure trees were faster. Stop telling people that they haven't backed themselves up Seriously, there is literally nothing in SQL more basic. Stop interpreting things you don't understand as missing things, and stop telling people that they haven't backed themselves up just because you don't understand the material placed in front of you. If you don't know the complexity of a simple SQL index scan, then this paper is aimed above your head. This is like claiming that a paper on A* doesn't prove its efficiency because it doesn't discuss order of growth on arrays. I'm just saying that what you liked didn't seem to back up your assertion. Closure trees are rather more a pain in the ass than the naive methods. Only that they made a lot of tasks much easier. If you can't figure out the complexity of a closure tree, then you're not ready for production SQL. If you understand SQL, you know how to draw the same conclusions about the closure tree. It pointed out the efficiency problems in the others. That being said: the presentation you linked, while interesting, said nothing about the "efficiency" of closure trees. Why do you assume I would point people to a tree mechanism that I've never used Have you considered trying this out yourself There's no reason for 90+% of Reddit's traffic to exist in the first place, except that nobody's willing to knuckle down for a week and write an incremental client-side javascript updater.Īny competant architect could lower their workload by two orders of magnitude in three months plus whatever amount of time it takes to rip out the mess that's in place. I'd be willing to bet that the vast bulk of Reddit's actual traffic is sending the same god damned thread over and over to small groups of people in arguments. Most of reddit could be served out of a flat cache and nobody would ever notice. ![]() Hundreds of thousands of dollars of servers a year to cope with less traffic than AOL in 2000 and several hours of spotty uptime every single day at predictable timings. This is the peak of cowboy seat-of-the-pants design. ![]() They're IO contending because they have datastructure tactics that would embarrass a college sophomore. They're IO contending because instead of running this out of a simple in-place threaded C++ app, it's a bunch of different languages, processes, daemons, packet exchange, transmission, boxing, unboxing, packing, unpacking, serialization, querying, reading, writing, etc. They're IO contending because they're rolling their own hot/cold system instead of relying on the extremely mature, extremely well planned ones in SQL backends. They're IO contending because they aren't relying on client caching or hot fragment updates at all. They're IO contending because they're not caching result sets anywhere near often enough. They're IO contending because they're buying rotational media, as if the best way to run an IO contending site is to buy the slowest storage on the mass market. They're IO contending because they use terrible SQL and terrible index planning, no partitioning, no sharding, etc. The hot dataset doesn't scale with the userbase. They're IO contending because they buy swarms of small servers instead of big servers (much cheaper) that have enough ram to hold the hot dataset. Don't want to get lost.Įdit: Posted this in PHP because I'm trying to use a similar model on a much smaller scale in a project I'm working on, and wanted to stick to responses from people that share my language of choice. I'm aware that Oracle, and I believe PostgreSQL, have this as a feature, so maybe the answer is simply that Reddit's using another DB? I haven't looked through the source code. I believe this is called Adjecency Model. Query Parent Query Parent for Children Query Children for Sub Children etc etc? I realize that coding the SQL in PHP or Python or what have you, you can set a limit to the depth, but here on Reddit since there's no limit, how is it done? Do they simply do multiple separate queries? That would be thousands of updates every time someone responded to a comment that has hundreds of child comments here on Reddit. I think it's because for every single update, every single record has to be updated. I've read a bit on the nested set model (Modified Preorder Tree Traversal):īut even though it's in, it rings in my head as not being very scalable. What I'm curious about is how Reddit handles its' hierarchical comment system that does not seem to have a set depth limit? parent I posted this on SO a few weeks ago in a larger question, but it didn't get answered.
0 Comments
Leave a Reply. |
Details
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |