Scalable contextual image indexing and search

CSCI 5253 - Data Center Scale Computing - Final Project

Table of Contents

About The Project
- Components
- Architecture
Getting Started
- Prerequisites
- Installation
Workflow of the system
Authors
Acknowledgments

About The Project

There are about 2.5 quintillion bytes of data produced by humans every day and much of it includes images. With this massive amount of data, it can be substantially hard to find one particular image even if organized well. So thinking on those terms, we have built a service that enables users to store and retrieve images based on contextual keywords. The functionality of the application would be to provide a service where users will be able to upload images, followed by the system identifying and extracting contextual keywords from the same. Later the service can be used by the user to search for any image in the collection based on contextual keywords and similar images are returned along with a safe search tag added to it.

The project goal can be broadly classified to have the following major components:

Scalable context feature extraction - The app is devised to take in images as input, store them and extract the contextual keywords all while being able to auto-scale based on load.
Safe search tagging - Based on details extracted from the image using the Google cloud vision API, the content is tagged for violence, racy, spoofed, and adult content.
Scalable search - The system is also built to search for the images based on contextual keywords and auto-scale when the load increases.

Project Video link

(back to top)

Components

Kubernetes
RabbitMQ
Redis
CloudSQL
Flask REST server
Google Cloud storage(Bucket)
Google Cloud Vision API

(back to top)

Architecture

(back to top)

Getting Started

The project follows a microservice architecture so the application can be broadly divided into 5 components:

Rest-server
Worker
Redis
RabbitMQ
Log server

Prerequisites

The following software, accounts and tools are required to get the project up and running:

Google cloud account with active credits
gcloud command-line tool
Docker
Kubernetes enable on Docker or Google Kubernetes engine
Python
Redis-CLI
Python libraries found in the requirements file of rest-server and worker

(back to top)

Installation

gcloud container clusters create --preemptible mykube - Create a cluster of nodes on GKE
sh deploy-all.sh - Launches all the services and deployments required on GKE.

Setting up the MySQL Database -

python worker/sqlsetup.py createDB
python worker/sqlsetup.py create

Setting up and deploying the ingress on GKE :

kubectl create clusterrolebinding cluster-admin-binding 
-- clusterrole cluster-admin
-- user $(gcloud config get-value account)

kubectl apply -f https://raw.githubusercontent.com/kubernetes/ingress-nginx/controller-v1.0.4/deploy/static/provider/cloud/deploy.yaml

Enabling the load balancer -

gcloud container clusters update mykube --update-addons=HttpLoadBalancing=ENABLED

Enabling horizontal pod autoscale on CPU usage -

kubectl autoscale deployment sciis-rest --cpu-percent=50 --min=1 --max=10
kubectl autoscale deployment final-worker --cpu-percent=50 --min=1 --max=10

(back to top)

Workflow of the system

The orchestration starts with the user accessing the webpage from the base URL endpoint, which is served from the Flask server as the request is routed through the network load balancer.
Users can then upload an image in the web application and the contents are passed to the server using a POST endpoint.
The MD5 value of the image is generated and used as the ID to store the image in Google cloud storage.
This is followed by the curation of the input data, which is sent to the worker node via RabbitMQ.
Worker nodes then dequeue the message from RabbitMQ, parse the data and retrieve the image from the Cloud Storage bucket using the MD5 value and then run it against the Google Cloud Vision API.
All the relevant contextual details and safe search tag extracted from the image is manipulated in the right format and stored in Redis and MySQL.
As these processes run, all the relevant debug and event information logs are written into a log server using RabbitMQ.
The end-user can search for the stored images using contextual keywords. The query is first run on Redis then MySQL to extract relevant images, if they are found, their public URLs along with safe search tags are rendered back else an appropriate message is displayed.
All the components run in their individual pods on the Google Kubernetes engine, thereby making them easy to deploy, and maintain. Horizontal scaling is enabled on the rest server and worker nodes which spin up new pods when the CPU utilization goes beyond a specified threshold.

(back to top)

Authors

Vishal Prabhachandar

Srinivas Akhil Mallela

(back to top)

Acknowledgments

Links to the resources used in the project.

(back to top)

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
.github		.github
logs		logs
rabbitmq		rabbitmq
redis		redis
resources		resources
rest		rest
worker		worker
.gitignore		.gitignore
README.md		README.md
deploy-all.sh		deploy-all.sh
deploy-local-dev.sh		deploy-local-dev.sh
ingress-nginx.yaml		ingress-nginx.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Scalable contextual image indexing and search

About The Project

Components

Architecture

Getting Started

Prerequisites

Installation

Workflow of the system

Authors

Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

vishalprabha/SCIIS

Folders and files

Latest commit

History

Repository files navigation

Scalable contextual image indexing and search

About The Project

Components

Architecture

Getting Started

Prerequisites

Installation

Workflow of the system

Authors

Acknowledgments

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages