Arjun Krishna Babu bio photo

Arjun Krishna Babu

Python. Machine Learning. Systems. Open source.

Email LinkedIn Github

My first Android app Snap Event was published to the Google Play Store a few days ago.

Snap Event creates calendar events from photographs of the event’s poster.

The app is by no means perfect, but it’s functional and is a proof of concept that the idea works.

Motivation

People say the best way to learn a new programming language or a framework is to do a project with it. Sure you need some familiarity with the basics of whatever it is that you’re trying to learn, but past that stage projects are the way to go.

There were 3 motivating factors behind me developing this app.

  1. My previous attempt at learning Android two years ago had failed; I wanted to give it another shot.
  2. I already had a project idea in mind (explained in the next paragraph).
  3. One of my professors asked us all to build an Android app; we’d be getting credit for it.

Whenever I saw posters of some event, be it a concert or a conference or whatever, I took its picture to keep track of what exactly was in the poster. Most people I personally knew did this. Perhaps I hadn’t searched well enough, but I couldn’t find any other app that’s similar in functionality to what I had in mind, which was surprising.

And so, my friend Alexander “Alex” Kaberlein and I decided to build this app.

App Development

Creating calendar event from their poster comes down to three things:

  1. Detect text from the image.
  2. Make sense of the detected text.
  3. Create calendar event.

Our focus was to ship a working app as early as possible, and then iron out its shortcomings. As a consequence, we have made some compromises during the design phase. This is supposed to be a learning experience after all, and we have zero intentions of making money out of this app.

Detecting Text From The Image

Though in retrospect step 2 was harder, our project could be shelved right away if we failed to detect text from images. And I had no knowledge whatsoever in computer vision. Unlike me, Alex has some familiarity with OpenCV, but we did not have sufficient time to come up with our own image recognition models.

Based on what I’ve read on the internet, I was aware about certain off-the-shelf libraries and services for image recognition. In particular, the three services we considered were:

  1. Amazon Rekognition
  2. Google Cloud Vision
  3. Google Mobile Vision

It didn’t take long for us to zero in on Google Mobile Vision:

  • Amazon Rekognition was supposedly pretty good at detecting objects within the image, but we did not know how well it worked with text.
  • Cloud Vision is capable of detecting text, but it’s not free (despite being inexpensive).
  • Mobile Vision is capable of detecting text, free, and likely works well with Android.

So we chose Mobile Vision.

Next up, we wanted to be sure that Mobile Vision detects text to our satisfaction; we didn’t want to discover at the last moment that Mobile Vision isn’t upto the task, if that was the case. For this, we built a small toy-app to detect text from images we hard-coded into the app. You can find it here.

The prototype worked well, and we decided to proceed with the main app.

Capturing the image and saving it on the phone was easier than I thought. However, I wanted to display all the images taken by the app in vertically scrolling cards, and that’s where I ran into my first major set of problems:

  • The high-resolution images take up a lot of memory, causing the scrolls to jitter. This was despite using RecyclerView.
  • Centering images within the cards was harder than I thought. You’ve to fiddle around with a Bitmap object, and make non-trivial mathematical calculations.

The Android documentation (precisely this page) pointed me to solutions for both these problems. There is a library called Glide, which handles the complexities associated with pulling in multiple images into your app. It also handles annoyances like centering your images properly into your ImageView.

Documentation for Glide could have been better; I had to resort to certain 3rd party websites to figure out certain use-cases. Speaking of which, you might want to read this excellent introduction to Glide.

Text Recognition using Mobile Vision

Mobile Vision supports:

  1. Face detection
  2. Barcode detection
  3. Text detection, which is what we want.

The library is capable of providing us text in whichever format we desire – as entire blocks of text, lines, words etc. Our plan of attack was to extract every single line of text in the poster, and then figure out what each of those lines meant.

There were a couple of things that I noticed about text detection using Mobile Vision:

  • Does not work well with handwritten text.
  • Extracted lines of text are not always in the top-to-bottom order.
  • Detected text sometimes differs for the same image on multiple runs – an unexpected non-deterministic behavior.
  • Occasionally crashes when using the .jpg format for the images. I resorted to using .png formats.
  • Use the ARGV_8888 methodology for storing pixels in memory. It took me hours to figure out why my app kept crashing – part of my app was using RGB_565 by default, and another part was expecting images in ARGV_8888.
    See this page to know more about what those configurations mean.

Making Sense of the Text

This is where we analyze the detected text to figure out what it means. Alex and I concluded that for any calendar event, the 3 crucial information required is:

  • Schedule
  • Title
  • Location

The problem with dates is that they are written in different ways in different parts of the world. For instance, “05 - 07 - 2017” is May 7th in the United States and July 5th in almost every other country. Weird ideas involving tricks with the geographic location of the user crossed my mind, but it didn’t feel right. Besides, in countries like the United States where people from all over the world are present, this idea is not foolproof.

Due to the complications mentioned above, we decided to infer the month only when it is written out in full, such as “January” (or its abbreviated form “Jan”). Though not always true, it is somewhat safe to assume that the date and the year would be on the same line as the month.

Our methodology for detecting the event title and location is so terrible that I’d rather not talk about it. Given a line of text, I have no reliable strategy to conclusively determine if it is an event title or a location or some other information.

To somewhat compensate for this, we are dumping every single line of text we detected into the description field of the calendar event.

Creating the Calendar Event

A trivial task in the grand scheme of things.

Android stores all calendar events to a central repository called the Calendar Provider. This is the reason calendar events you create using any calendar app shows up on any other calendar app that you install. Think of it this way – all media files in your phone would show up automatically on whatever media player you install.

Once all the information needed to create a calendar event is ready, it’s just a matter of starting an activity to create the calendar event.

We chose to open up the calendar app populated with the event information (rather than silently create the calendar event in the background) to give users an opportunity to review the event before it is saved.

Publishing on Google Play Store

Like most other things by Google, the steps for publishing the Android app are well documented. Once I felt ready to launch the app, I followed the official launch checklist.

It wasn’t a quick process though:

  • It took them over 24 hours to verify my one-time registration fee of $25.
  • It took approximately 6 hours for my app to show up on the Google Play store after I clicked the button to rollout and publish the app.

What I Learned From Developing This App

  • Basic Android programming. I’m now confident about reading through the documentation (as well as other resources) to figure out how to get things done, so that I can create more complex Android apps in the future.
  • How to actually publish the Android app.
  • How to use the Glide library to display multiple images efficiently.
  • How to set up and use Google’s Mobile Vision library.

Shortcomings

Much of the shortcomings of the app has already been covered in the App Development section above, but here I’ll elucidate the shortcomings:

  • Date detection happens only if the date is written out in full.
    In the real world, dates are not always written out in full, severely limiting the usability of the app.
  • Date detection fails if it contains ordinal suffixes like “st, “nd”, “rd”, or “th”.
    However, this can be quickly fixed.
  • Event title and location detection is not perfect.
    I’m contemplating on using machine-learning models to accomplish this task, but it’s still quite some time away.
  • The way the images are saved onto the phone, though working perfectly, uses a terrible strategy that I’m ashamed to mention here. This would be evident only if you read through the source-code.
  • User experience can be improved.

What Next

I’ll fix the relatively easier things I mentioned in the shortcomings section, but I likely won’t pursue this project too further due to reasons I mention next.

Google Lens

Among the host of exciting things Google announced during Google I/O 2017 is Google Lens. Google Lens does exactly what our app does (along with a bunch of other cool things). And Google Lens does it rather well, much better than what I have created through Snap Event.

And I’m happy and excited about the fact that Google did it. At least, it can be expected to be a lot more reliable and useful.

Get Snap Event

Get it on Google Play

Google Play and the Google Play logo are trademarks of Google Inc.