We are all working really hard for making an impact and once in a while there’s an opportunity for making a HUGE impact by a really small effort. After reading this post you’ll be able to make a fast impact on your web-servers and improve your site’s performance!
REST API (a subset of HTTP) is one of the most popular APIs. If you are working in a SaaS company you probably use it as an external API for communicating with your customers and you might also use it internally across your services. At Dynamic Yield we are serving thousands of…
Jupyter notebook is a well-known web tool for running live code. Apache Spark is a popular engine for data processing and Spark on Kubernetes is finally GA! In this tutorial, we will bring up a Jupyter notebook in Kubernetes and run a Spark application in client mode. We will also use a cool sparkmonitor widget for visualization. Additionally, our Spark application will read some data from AWS S3 that will simulate that locally with localstack S3.
While serving a huge amount of requests, we can easily observe that our traffic graph looks like a sine wave with a high rate at midday and a lower rate at night. The difference is relatively big, around 2–3 times more requests in the rush hours. Moreover, there are special occasions such as Black Friday, Cyber Monday, sale campaigns, etc, that our traffic can raise up to x3.
Using Kubernetes’s elasticity…
Here at Dynamic Yield we are serving thousands of HTTP requests per second. Moving our serving services from EC2 to EKS required some tuning to ensure we could serve efficiently without losing any request. I summarized some tips and I hope it might help others with a smooth transition.
Kubernetes can decide to proactively replace pods (replicas) of our application, whenever:
Celery communicates via messages, usually using a broker to mediate between clients and workers. We use RabbitMQ as the broker.
In our system, we have ~30 different celery tasks such as:
There’s a huge variety between those tasks: some of them can run for seconds while others can take hours, depending on the data (size) being processed and the…
Elasticsearch is being used in a lot of companies as a great search-engine thanks to its speed and scale. At Dynamic Yield, we are using Elasticsearch as part of our recommendations engine and handle thousands of requests per second.
Elasticsearch cluster contains several nodes that can play one or more roles: master, data, ingestion, etc.
Indices are being stored in the data nodes.
Each index can have one or more shards, usually determined by the index size.
Each shard holds part of the index data (documents), so if for example, we have an index with two shards, each shard holds…