A more modern web architecture
Over the past year I’ve been revamping the project mentioned in my last post. Today, the architecture for the application has been split to a tiered approach with a backend API with front end and other backend services layered on top. The stack has shifted from Java + Python/Django to almost completely a Node.js based framework. I have also switched from using Apache to nginx mainly because I feel that it is much more straightforward to configure for multiple subdomains and applications. MySQL is still the primary data store, but I have added Redis for certain tasks that don’t need to be persistent, but need to happen fast. I’m sure usage of redis will grow as I find more use cases for it in my architecture. At this point the main system is comprised of three distinct applications. The API, Web, and Crawler applications.
The API and Web applications are built using the Hapi.js from Walmart Labs. It’s a great framework with lots of freedom around how you want to do things while bootstrapping the core functionalities of the web based apps. The main flow of information goes from clients’ browsers through the Web App, and then through the backend. This gives me a lot of opportunities to scale each piece individually as need arises. I have also switched to using Amazon S3 for storage of the listing images, of which there are currently around 1.5 million. NginX has a cache for these specific routes which are masked as a cdn subdomain off the main application domain. It’s pretty cool because this allows me to store virtually unlimited amounts of images in S3, while the most popular listings will always be retrieved from NginX on my own server after the first load. This saves on cost since I rarely contact S3 to retrieve images.
The crawler is a backend application built with Node.js as well. I feel that Node.js shines in this area since the bulk of time will be spent waiting on network connections from external real estate sources. The non blocking nature of Node.js combined with its single-threaded execution has yielded much less memory usage than its old Java counterpart. It also feels faster and more quick to complete the same tasks it was doing before. The crawler finds relevant information and uploads it to the API which the Web App uses to display to end users.
Link to project: Immown