Picture a scenario where we need to build a product or a food item image processing feature for our food aggregator app like Swiggy, Zomato or Uber Eats.
When the merchants upload the image into the "Image upload and ingestion module", are they uploading the url because from your explanation it seems like Kafka only sends a url to the "Image Processing module".
If yes, why is the deduplication not done on the server where the image blob is initially uploaded to and the url is generated.
If no, how is the "Image Processing module" receiving the url. I thought the users are uploading their data directly with the use of APIs on the "Image upload and ingestion module"
Kafka sends the images to the image processing module which is Flink implementation, not the image URLs. The URLs will ideally be created once the image files are stored in the cloud object store in the image storage module.
Image processing module is not receiving the URL. It is receiving the images and then after processing and storage, the URLs get created.
Let me know if you any other questions, I'll be happy to answer.
I'm currently taking a course on Software architecture. Your last two post seem to be helping me fix the pieces as to how things which I've learnt are applied in real world scenario.
I would appreciate if you keep up with posts like this.
I intend reading your post about Github hashing of codes later this evening as well.
In a scenario where a restaurant decides to remove a certain dish and the associated images, we would just remove the dish data from the database with respect to that restaurant without removing the images. The images would still reference to other restaurant's dish data. We could also delete the images if the images do not reference other restaurant's dishes.
Gem!! Thank you sir!!!
Hello Shivang,
Thanks for this post and your previous one.
However, I have one question,
When the merchants upload the image into the "Image upload and ingestion module", are they uploading the url because from your explanation it seems like Kafka only sends a url to the "Image Processing module".
If yes, why is the deduplication not done on the server where the image blob is initially uploaded to and the url is generated.
If no, how is the "Image Processing module" receiving the url. I thought the users are uploading their data directly with the use of APIs on the "Image upload and ingestion module"
Kafka sends the images to the image processing module which is Flink implementation, not the image URLs. The URLs will ideally be created once the image files are stored in the cloud object store in the image storage module.
Image processing module is not receiving the URL. It is receiving the images and then after processing and storage, the URLs get created.
Let me know if you any other questions, I'll be happy to answer.
Thanks Shivang for your response.
I'm currently taking a course on Software architecture. Your last two post seem to be helping me fix the pieces as to how things which I've learnt are applied in real world scenario.
I would appreciate if you keep up with posts like this.
I intend reading your post about Github hashing of codes later this evening as well.
Thanks a lot for all your posts.
I will. Thank you for sharing your thoughts. I am glad my posts are helpful. Cheers!
Great article :)
I just had one question:
This system doesn't seem to handle removal of images. If it were to do that, how would that work with the image de-duplication that's in place?
For example:
Two users upload the same image
One of the users then deletes theirs
What would happen to the second user's image? In this case we would need to have the image duplicated right?
In a scenario where a restaurant decides to remove a certain dish and the associated images, we would just remove the dish data from the database with respect to that restaurant without removing the images. The images would still reference to other restaurant's dish data. We could also delete the images if the images do not reference other restaurant's dishes.