Cloud Vision
The Google Cloud Vision API allows users to leverage machine learning algorithms for processing images and documents including: image classification, face detection, text extraction, optical character recognition, and others.
Spring Framework on Google Cloud provides:
-
A convenience starter which automatically configures authentication settings and client objects needed to begin using the Google Cloud Vision API.
-
CloudVisionTemplate
which simplifies interactions with the Cloud Vision API.-
Allows you to easily send images, PDF, TIFF and GIF documents to the API as Spring Resources.
-
Offers convenience methods for common operations, such as classifying content of an image.
-
-
DocumentOcrTemplate
which offers convenient methods for running optical character recognition (OCR) on PDF and TIFF documents.
Dependency Setup
To begin using this library, add the spring-cloud-gcp-starter-vision
artifact to your project.
Maven coordinates, using Spring Framework on Google Cloud BOM:
<dependency>
<groupId>com.google.cloud</groupId>
<artifactId>spring-cloud-gcp-starter-vision</artifactId>
</dependency>
Gradle coordinates:
dependencies {
implementation("com.google.cloud:spring-cloud-gcp-starter-vision")
}
Configuration
The following options may be configured with Spring Framework on Google Cloud Vision libraries.
Name |
Description |
Required |
Default value |
|
Enables or disables Cloud Vision autoconfiguration |
No |
|
|
Number of threads used during document OCR processing for waiting on long-running OCR operations |
No |
1 |
|
Number of document pages to include in each OCR output file. |
No |
20 |
Cloud Vision OCR Dependencies
If you are interested in applying optical character recognition (OCR) on documents for your project, you’ll need to add both spring-cloud-gcp-starter-vision
and spring-cloud-gcp-starter-storage
to your dependencies.
The storage starter is necessary because the Cloud Vision API will process your documents and write OCR output files all within your Google Cloud Storage buckets.
Maven coordinates using Spring Framework on Google Cloud BOM:
<dependency>
<groupId>com.google.cloud</groupId>
<artifactId>spring-cloud-gcp-starter-vision</artifactId>
</dependency>
<dependency>
<groupId>com.google.cloud</groupId>
<artifactId>spring-cloud-gcp-starter-storage</artifactId>
</dependency>
Gradle coordinates:
dependencies {
implementation("com.google.cloud:spring-cloud-gcp-starter-vision")
implementation("com.google.cloud:spring-cloud-gcp-starter-storage")
}
Image Analysis
The CloudVisionTemplate
allows you to easily analyze images; it provides the following method for interfacing with Cloud Vision:
public AnnotateImageResponse analyzeImage(Resource imageResource, Feature.Type… featureTypes)
Parameters:
-
Resource imageResource
refers to the Spring Resource of the image object you wish to analyze. The Google Cloud Vision documentation provides a list of the image types that they support. -
Feature.Type… featureTypes
refers to a var-arg array of Cloud Vision Features to extract from the image. A feature refers to a kind of image analysis one wishes to perform on an image, such as label detection, OCR recognition, facial detection, etc. One may specify multiple features to analyze within one request. A full list of Cloud Vision Features is provided in the Cloud Vision Feature docs.
Returns:
-
AnnotateImageResponse
contains the results of all the feature analyses that were specified in the request. For each feature type that you provide in the request,AnnotateImageResponse
provides a getter method to get the result of that feature analysis. For example, if you analyzed an image using theLABEL_DETECTION
feature, you would retrieve the results from the response usingannotateImageResponse.getLabelAnnotationsList()
.AnnotateImageResponse
is provided by the Google Cloud Vision libraries; please consult the RPC reference or Javadoc for more details. Additionally, you may consult the Cloud Vision docs to familiarize yourself with the concepts and features of the API.
Detect Image Labels Example
Image labeling refers to producing labels that describe the contents of an image. Below is a code sample of how this is done using the Cloud Vision Spring Template.
@Autowired
private ResourceLoader resourceLoader;
@Autowired
private CloudVisionTemplate cloudVisionTemplate;
public void processImage() {
Resource imageResource = this.resourceLoader.getResource("my_image.jpg");
AnnotateImageResponse response = this.cloudVisionTemplate.analyzeImage(
imageResource, Type.LABEL_DETECTION);
System.out.println("Image Classification results: " + response.getLabelAnnotationsList());
}
File Analysis
The CloudVisionTemplate
allows you to easily analyze PDF, TIFF and GIF documents; it provides the following method for interfacing with Cloud Vision:
public AnnotateFileResponse analyzeFile(Resource fileResource, String mimeType, Feature.Type… featureTypes)
Parameters:
-
Resource fileResource
refers to the Spring Resource of the PDF, TIFF or GIF object you wish to analyze. Documents with more than 5 pages are not supported. -
String mimeType
is the mime type of the fileResource. Currently, onlyapplication/pdf
,image/tiff
andimage/gif
are supported. -
Feature.Type… featureTypes
refers to a var-arg array of Cloud Vision Features to extract from the document. A feature refers to a kind of image analysis one wishes to perform on a document, such as label detection, OCR recognition, facial detection, etc. One may specify multiple features to analyze within one request. A full list of Cloud Vision Features is provided in the Cloud Vision Feature docs.
Returns:
-
AnnotateFileResponse
contains the results of all the feature analyses that were specified in the request. For each page of the analysed document the response will contain anAnnotateImageResponse
object which you can retrieve usingannotateFileResponse.getResponsesList()
. For each feature type that you provide in the request,AnnotateImageResponse
provides a getter method to get the result of that feature analysis. For example, if you analysed an PDF using theDOCUMENT_TEXT_DETECTION
feature, you would retrieve the results from the response usingannotateImageResponse.getFullTextAnnotation().getText()
.AnnotateFileResponse
is provided by the Google Cloud Vision libraries; please consult the RPC reference or Javadoc for more details. Additionally, you may consult the Cloud Vision docs to familiarize yourself with the concepts and features of the API.
Running Text Detection Example
Detect text in files refers to extracting text from small document such as PDF or TIFF. Below is a code sample of how this is done using the Cloud Vision Spring Template.
@Autowired
private ResourceLoader resourceLoader;
@Autowired
private CloudVisionTemplate cloudVisionTemplate;
public void processPdf() {
Resource imageResource = this.resourceLoader.getResource("my_file.pdf");
AnnotateFileResponse response =
this.cloudVisionTemplate.analyzeFile(
imageResource, "application/pdf", Type.DOCUMENT_TEXT_DETECTION);
response
.getResponsesList()
.forEach(
annotateImageResponse ->
System.out.println(annotateImageResponse.getFullTextAnnotation().getText()));
}
Document OCR Template
The DocumentOcrTemplate
allows you to easily run optical character recognition (OCR) on your PDF and TIFF documents stored in your Google Storage bucket.
First, you will need to create a bucket in Google Cloud Storage and upload the documents you wish to process into the bucket.
Running OCR on a Document
When OCR is run on a document, the Cloud Vision APIs will output a collection of OCR output files in JSON which describe the text content, bounding rectangles of words and letters, and other information about the document.
The DocumentOcrTemplate
provides the following method for running OCR on a document saved in Google Cloud Storage:
CompletableFuture<DocumentOcrResultSet> runOcrForDocument(GoogleStorageLocation document, GoogleStorageLocation outputFilePathPrefix)
The method allows you to specify the location of the document and the output location for where all the JSON output files will be saved in Google Cloud Storage.
It returns a CompletableFuture
containing DocumentOcrResultSet
which contains the OCR content of the document.
Running OCR on a document is an operation that can take between several minutes to several hours depending on how large the document is.
It is recommended to register callbacks to the returned CompletableFuture or ignore it and process the JSON output files at a later point in time using readOcrOutputFile or readOcrOutputFileSet .
|
Running OCR Example
Below is a code snippet of how to run OCR on a document stored in a Google Storage bucket and read the text in the first page of the document.
@Autowired private DocumentOcrTemplate documentOcrTemplate; public void runOcrOnDocument() { GoogleStorageLocation document = GoogleStorageLocation.forFile( "your-bucket", "test.pdf"); GoogleStorageLocation outputLocationPrefix = GoogleStorageLocation.forFolder( "your-bucket", "output_folder/test.pdf/"); CompletableFuture<DocumentOcrResultSet> result = this.documentOcrTemplate.runOcrForDocument( document, outputLocationPrefix); DocumentOcrResultSet ocrPages = result.get(5, TimeUnit.MINUTES); String page1Text = ocrPages.getPage(1).getText(); System.out.println(page1Text); }
Reading OCR Output Files
In some use-cases, you may need to directly read OCR output files stored in Google Cloud Storage.
DocumentOcrTemplate
offers the following methods for reading and processing OCR output files:
-
readOcrOutputFileSet(GoogleStorageLocation jsonOutputFilePathPrefix)
: Reads a collection of OCR output files under a file path prefix and returns the parsed contents. All the files under the path should correspond to the same document. -
readOcrOutputFile(GoogleStorageLocation jsonFile)
: Reads a single OCR output file and returns the parsed contents.
Reading OCR Output Files Example
The code snippet below describes how to read the OCR output files of a single document.
@Autowired private DocumentOcrTemplate documentOcrTemplate; // Parses the OCR output files corresponding to a single document in a directory public void parseOutputFileSet() { GoogleStorageLocation ocrOutputPrefix = GoogleStorageLocation.forFolder( "your-bucket", "json_output_set/"); DocumentOcrResultSet result = this.documentOcrTemplate.readOcrOutputFileSet(ocrOutputPrefix); System.out.println("Page 2 text: " + result.getPage(2).getText()); } // Parses a single OCR output file public void parseSingleOutputFile() { GoogleStorageLocation ocrOutputFile = GoogleStorageLocation.forFile( "your-bucket", "json_output_set/test_output-2-to-2.json"); DocumentOcrResultSet result = this.documentOcrTemplate.readOcrOutputFile(ocrOutputFile); System.out.println("Page 2 text: " + result.getPage(2).getText()); }
Sample
Samples are provided to show example usages of Spring Framework on Google Cloud with Google Cloud Vision.
-
The Image Labeling Sample shows you how to use image labelling in your Spring application. The application generates labels describing the content inside the images you specify in the application.
-
The Document OCR demo shows how you can apply OCR processing on your PDF/TIFF documents in order to extract their text contents.