JD Underground Tech
by Andrew G. Watters, Esq.
Where to begin...well, I was on the original JD Underground in the 2009-2015 time frame and it was a funny, whimsical site with an ancient home-built interface and backend created by the enigmatic Admin. I have many mostly negative memories of that time period, as the amount of toxic negativity on the board was unbelievable. In 2019, Admin randomly pulled the plug on the whole site and it led to a blank page, leading folks to wonder what happened. I probably would have gotten sick to death of it also after years of toil, but here we are. I decided to resurrect the site in Fall 2023 partly as a programming challenge for myself. In this technical article, I explain the backend technology of the new JDU and offer the concept as essentially a database accelerator that has no dependencies and relies solely on command line tools. The system is a superpowered NoSQL bulletin board system (BBS) without a single line of SQL or any database, which I had thought was impossible, but I ended up making the impossible possible. I have this running joke on JDU that I am the greatest lawyer-programmer in the world, and it's because of this project that I have both been told that and somewhat immodestly claim to be that. JDU represents an achievement in command line programming because these individual utilities are all included in Linux but have not before been used quite this way. I'm very proud of this (can you tell?) and decided to document the system in case anyone is curious.
Here is JD Underground in pseudo code, with some of the actual code:
---Display--- 1. User requests /all 2. System finds all posts in folder /all newer than 60 days, and slices out the top 300 files sorted by the file modification date, which is when the last reply would have been made.In other words, the built-in Linux utilities find, sed, head, tail, grep, sort, and cut are all that it takes to generate a fully functional NoSQL BBS. I was not expecting that, and I wish I had known this 30 years ago-- the coolest part is that the system is completely portable and only has to be dragged and dropped into the web directory of a new server. Replication is also a breeze with rsync. On my respectable but not bleeding-edge web server, the entire /all display sequence is completed in about 400 milliseconds, and it might take a typical computer 300 ms to render the page, so the user gets a page in less than a second no matter where they click on the site (except /top, which takes substantial processing since I haven't optimized it yet). This is extreme performance considering that text processing is the biggest task. I have a RAID 5 of solid state drives, which greatly improves the performance, as well. Loading /all is the typical operation on the site and I've tried to balance the compute with the features and display time for a favorable experience. If I adjust the parameters of how many posts to load, I can add or reduce complexity and fine-tune the page load times. Since the site gets a lot of traffic, the goal was to provide a great experience for everyone, and I think I've succeeded in doing so.find ./posts/all -mtime -60 -not -name \"sticky*\" -name \"*.txt\" -type f -printf \"%T@--%Tc--%p\n\" | sort -n -r | head -n 3003. System reads the first couple of lines of each file to get post subject line and date posted.sed -n '1,4p;5q' $filename4. System reads .meta files that record reactions and how many replies there are to each post.head -3 '" . $replies . ".meta' | tail -1 | cut -d \":\" -f 25. System generates a page from this list of files and first couple of lines, including links to display each post, and includes a link to get next page of results (not yet retrieved). 6. Page displayed to user.
For retrieving individual posts, the system literally takes the filename from the link generated in step 5 above, and reads and processes the file. That is far, far faster than paging through results-- the typical post is retrieved in less than 40 ms and displayed nearly instantaneously, and this is the same level of performance for all the posts because they are literally files on the file system. I was going for maximum performance, not maximum display-friendliness, since the original JDU was very basic and the content is what people are after, not the pretty presentation. I had intended to solve the look-and-feel challenges with customizable stylesheets, but I haven't gotten around to that yet.
The reaction feature was a tough one to implement, but I also succeeded in doing that without SQL. The reactions are stored in a .meta file and are keyed to the specific reply or post that they relate to. That was particularly difficult, but I used grep to match up the keys and display the reactions along with the user who reacted. Reactions are not as popular as I had hoped, but people still use the feature, so I've retained it.
Future challenges will include how to handle large volumes of posts in the same folder if the site becomes more popular. What I will probably do is sort posts into folders for each year so that there is never more than a critical mass of posts in the /all folder and the find utility doesn't complain. So far, this looks like it will scale nicely and I have no idea what the limits will be. With only 100 or so users at the moment, this a low-volume but fun place, and I hope to keep it that way.
top