A Guide to Google Cloud Platform’s Vision API
Send me Nitor Infotech's Monthly Blog Newsletter!
×
Software Product Engineering Services Company
  • Company
    • About
    • Leadership
    • Partnership
  • Resource Hub
  • Blog
  • Contact
Software Product Engineering Services Company
Add more content here...
Artificial intelligence Big Data Blockchain and IoT
Business Intelligence Cloud and DevOps Digital Transformation
Healthcare IT Manufacturing Mobility
Product Modernization Software Engineering Thought Leadership
Aastha Sinha Abhijeet Shah Abhishek Suranglikar
Abhishek Tanwade Ajinkya Pathak Amol Jadhav
Ankita Kulkarni Antara Datta Anup Manekar
Chandra Gosetty Chandrakiran Parkar Dr. Girish Shinde
Gaurav Rathod Harshali Chandgadkar Madhavi Pawar
Milan Pansuriya Mohit Agarwal Mohit Borse
Nalini Vijayraghavan Neha Garg Omkar Ingawale
Omkar Kulkarni Pranit Gangurde Prashant Kamble
Priya Patole Ravi Agrawal Robin Pandita
Rohini Wwagh Sachin Saini Sadhana Sharma
Sambid Pradhan Sanjeev Fadnavis Shardul Gurjar
Shravani Dhavale Shubham Hedau Shubham Kamble
Shubham Muneshwar Sidhant Naveria Sujay Hamane
Tejbahadur Singh Tushar Sangore Vasishtha Ingale
Veena Metri Vidisha Chirmulay
Artificial intelligence | 25 Mar 2022 |   11 min

A Guide to Google Cloud Platform’s Vision API

Computer vision is a field of Artificial Intelligence (AI) that allows computers and other systems to obtain meaningful insights from images, videos, and so on. Then, based on that information, some actions can be taken, or recommendations can be made. Google Cloud Platform is one of the many cloud computing platforms offering computer vision services.

The one that I am going to focus upon in my blog today is Google Cloud Vision, also known as Vision API. Vision detection can easily be integrated with many applications, like face detection, landmark detection, image labelling, OCR, etc. Powerful pre-trained machine learning models are offered by Vision AI through REST & RPC APIs. Labels can be assigned to images and rapidly classified into millions of predefined categories. Objects and faces can be detected, printed and handwritten text can be read, and valuable metadata can be built into your image catalog.

So, for starters, let’s dive into the features of Google Cloud Vision!

Features of Google Cloud Vision

Several types of detection are possible using Google Cloud Vision. A few of them are as follows:

CROP_HINTS Suggested vertices for a crop region on an image are determined.
FACE_DETECTION Faces within the image are detected.
IMAGE_PROPERTIES Sets of image properties are computed, like dominant colors in an image.
LABEL_DETECTION Adding labels based on image content.
LANDMARK_DETECTION Geographic landmarks within the image are detected.
LOGO_DETECTION Company logos within the image are detected.
OBJECT_LOCALIZATION Extract after detecting multiple objects in an image.
SAFE_SEARCH_DETECTION Safe Search detects likely unsafe or undesirable content.
TEXT_DETECTION & DOCUMENT_TEXT_DETECTION Optical Character Recognition (OCR) on text within the image is performed. For sparse text within large image, we can use TEXT_DETECTION. Use DOCUMENT_TEXT_DETECTION, if the image is a document, has dense text, or contains handwriting.
WEB_DETECTION Entities such as events, news or celebrities in the image are detected, and similar images on the web are found using the power of Google Image Search.

Use Cases of Google Cloud Vision

Take a look at the several use cases of Google Cloud Vision that simplify work for organizations:

License Plate Detection – Some countries have parking lots that are led by the License Plate Recognition Model to punch in the entry and exit time of the vehicle. Detection of vehicles violating the traffic rules can also be adopted as a feature of License Plate Recognition.

Resume Parser – It is a tedious task for institutions or large companies to scan each resume and get useful information from it. So, OCR can be useful in this case along with some NLP (Natural Language Processing) as it reduces time and increases accuracy.

Contract Parser – Text within legal contracts can be hindered if written by humans. The solution for this is using Cloud Vision, as it extracts text and values from legal contracts such as agreement date, effective date, parties, expiration date, governing law, and notice to terminate renewal.

Receipt and Invoice Scanning – Financial balancing is one of the important activities for any organization. Large companies that purchase frequently need to gather and process all invoices and receipts. Automated pipelines can be made to recognize invoices.

Now that you are acquainted with the use cases, allow me to explain how Vision API actually works.

How Vision API Works

1. Set up the client library for Cloud Vision API.

2. Installing and importing the libraries

3. Importing Google Cloud credentials

You need to place your own JSON file that you generated earlier using your Google Cloud account.

The source path needs to be your file path i.e., the location of the image path.

4. Importing an image and passing it to the Crop Hints function

5. Crop Hints function

Crop Hints uses the Vision API Crop Hints feature. You can provide the image to be processed either through a Cloud Storage URI or embedded within the Crop Hints request. A successful Crop Hints response returns the coordinates for a bounding box cropped around the dominant object in the image.

Firstly, create a client instance by using the ImageAnnotatorClient class for accessing the Vision API. The client library encapsulates the details for requests and responses to the API.

After image reading, we call the crop_hints method of the ImageAnnotatorClient instance to bounds for the first crop hint. We can crop the image as we want, only because we have the vertices from the first crop hint (I’ve cropped the image where the number normally occurs). Save that cropped image (number plate image) in a separate directory. This is the last step. Then the saved cropped image is passed on for erosion and dilation.

6. Erosion and Dilation

The cropped image is again pre-processed by erosion and dilation to remove the noise around the number plate image.

The iteration and kernel can be adjusted to remove the noise. Apply dilation after erosion. Save the processed image and return its path to crop hint function and in return, the crop hint will return that eroded image path to our primary function i.e., you’re returning the cropped eroded image to recognize_plate function.

7. Using Image Annotation and getting the actual content

Create a client instance by using the ImageAnnotatorClient class for accessing the Vision API. It checks if you have permissions from cloud to use the API. Then we open the cropped eroded image for recognizing the content using Object Localization and Text Detection (Text Detection for OCR and Object Localization for detecting multiple plates) and then saving the annotated responses.

These responses are appended to a data frame for taking its first response that consists of the license plate number and then splitting it for getting the license plate number.

8. Output

The above output is achieved by passing the number from recognize_license_plate() to a flask API and display it on a webpage.

This is one way of having a number of license plates. More image pre-processing might lead to more accurate results.

As my blog has taken you through every aspect of Cloud Vision, I can vouch for the fact that Cloud Vision is the best when running as an OCR. Its Object Detection labels are more relevant, and Object Localization can be filtered using inbuilt labels. What’s more, it comes with more flexible API conventions. I must tell you that I am incredibly delighted with the utilization and flexibility of the Cloud Vision API!

Do reach out to us at Nitor Infotech if you’d like to share your experiences using Cloud Vision.

Related Topics

Artificial intelligence   Big Data   Blockchain and IoT   Business Intelligence   Cloud and DevOps   Digital Transformation   Healthcare IT   Manufacturing   Mobility   Product Modernization   Software Engineering   Thought Leadership  
<< Previous Blog Next Blog >>

Shubham Kamble

Trainee Software Engineer

Shubham is a Trainee Software Engineer at Nitor Infotech. As an AI/ML enthusiast, he is eager to learn how every new AI technology works behind the scenes. Coding with coffee is the only thing he loves. Working on projects related to images, video, and audio with a deep understanding of algorithms and cloud services enabled him to be proficient in the AI/ML space. "Learning never exhausts the mind" is his mantra and that's what makes him versatile. He is passionate about strength-training and travelling.

   

You may also like

Data Extraction from SAP

In our technology-focused world, SAP ERP systems are very popular. Irrespective of the size or industry, businesses can reap the benefits of SAP. As you may be ...
Read Blog


What is Business Process Automation?

You sit down at your desk, caught completing important, tedious tasks that require nearly no mind function on your part. Your squad appears unmotivated, and your organization boom goes slower than ...
Read Blog


Flutter State Management: Everything You Need to Know

Nowadays Flutter is one of the hottest topics for mobile developers. When it comes to Flutter, there is a wide range of topics to discuss. But the most important and necessary topic is ‘Flutter ...
Read Blog


Subscribe to Our Blog

Services

    Modern Software Engineering


  • Idea to MVP
  • Quality Engineering
  • Product Engineering
  • Product Modernization
  • Reliability Engineering
  • Product Maintenance

    Enterprise Solution Engineering


  • Idea to MVP
  • Strategy & Consulting
  • Enterprise Architecture & Digital Platforms
  • Solution Engineering
  • Enterprise Cognition Engineering

    Digital Experience Engineering


  • UX Engineering
  • Content Engineering
  • Peer Product Management
  • RaaS
  • Mobility Engineering

    Technology Engineering


  • Cloud Engineering
  • Cognitive Engineering
  • Blockchain Engineering
  • Data Engineering
  • IoT Engineering

    Industries


  • Healthcare
  • Retail
  • Manufacturing
  • BFSI

    Company


  • About
  • Leadership
  • Partnership
  • Contact Us

    Resource Hub


  • White papers
  • Brochures
  • Case studies
  • Datasheet

    Explore More


  • Blog
  • Career
  • Events
  • Press Releases
  • QnA

About


With more than 15 years of experience in handling multiple technology projects across industries, Nitor Infotech has gained strong expertise in areas of technology consulting, solutioning, and product engineering. With a team of 700+ technology experts, we help leading ISVs and Enterprises with modern-day products and top-notch services through our tech-driven approach. Digitization being our key strategy, we digitally assess their operational capabilities in order to achieve our customer's end- goals.

Get in Touch


  • +1 (224) 265-7110
  • marketing@nitorinfotech.com

We are Social 24/7


© 2022 Nitor Infotech All rights reserved

  • Terms of Usage
  • Privacy Policy
  • Cookie Policy
x
We use cookies to ensure that we give you the best experience on our website. If you continue to use this site we will assume that you are happy with it. Accept Cookie policy