Develop Freedom

Sapiens at the Crossroads of the Intelligence Age

Tue, 24 Sep 2024 08:22:26 +0000

Humanity has always been shaped by the tools we create. From fire and the wheel to agriculture and steam engines, each technological leap has redefined what it means to be human. These advancements didn’t just make life easier; they fundamentally reshaped societies, economies, and power structures. As we enter the Intelligence Age, where artificial intelligence becomes an omnipresent resource, we stand on the cusp of a transformation as significant as the Cognitive Revolution that set Homo sapiens apart from other species. Just as the ability to imagine and communicate abstract ideas allowed humans to dominate the planet, AI has the potential to redefine intelligence itself—both human and artificial—ushering in a new era of progress, but also deep disruption.

Reflecting on the last decade, working at the intersection of AI and machine learning since the early breakthroughs in deep learning, I’ve witnessed firsthand the rapid acceleration of what machines can accomplish. What was once a field of theoretical promise became an engine for real-world transformation, especially as deep learning models began scaling with access to more data and computational power. Having contributed to this evolution, building systems ranging from recommendations & search systems to complex data understanding ML models & data pipelines, I’ve seen how AI’s growth doesn’t just affect the technical domain—it profoundly shapes economies, societies, and global geopolitics.

As we enter this Intelligence Age, we must ask ourselves: How will these changes ripple through the human experience, as earlier revolutions have done? The challenge isn’t simply technical, but philosophical—how we manage this profound shift will determine the future of humanity.

Economic Shifts: Decoupling Labor from Value

From the dawn of human history, labor has been the bedrock of economic value. Agriculture required physical effort to produce food, just as industry needed human hands to run factories. But AI introduces a new economic paradigm: a world where cognitive labor—the very thing that defined human superiority—can be automated. As machines take on more complex tasks, from financial modeling to legal analysis, we begin to see the decoupling of human labor from economic productivity.

For those of us who have worked within this field, building and scaling AI systems, the implications are clear. AI doesn’t just automate repetitive tasks; it can now handle nuanced decision-making processes, fundamentally altering the role of human workers. In this new economy, value creation increasingly shifts from labor to ownership of technology and data. The control of AI systems and the data that powers them could lead to a concentration of wealth and power in the hands of a few, deepening inequality unless economic models are rethought.

This decoupling also introduces the possibility of deflationary pressures. As the cost of goods and services declines due to AI-driven efficiencies, traditional economic growth models—based on consumption and inflation—may need to be reevaluated. Central banks and policymakers must rethink how they manage an economy where productivity surges but human labor plays a diminishing role.

Every major technological shift has also reshaped human identity. Just as the Agricultural Revolution changed nomadic tribes into settled communities, and the Industrial Revolution turned craftspeople into factory workers, the Intelligence Age will redefine what it means to work and live. Throughout my career in AI, I’ve seen how automation is already transforming the workplace, shifting labor from routine tasks to more creative or emotional roles. But this is only the beginning.

Work has historically provided more than just income—it has been central to our sense of purpose and identity. What happens when AI automates many of the tasks that once defined professions? As AI takes over roles in healthcare, finance, manufacturing and even creative industries, society will need to find new ways to derive meaning and purpose outside of traditional employment.

This shift will require a fundamental rethinking of education and skill development. The skills that AI cannot replicate—creativity, empathy, emotional intelligence—will become the most valuable human traits. Educational systems, long focused on rote learning and standardized testing, will need to adapt to prepare individuals for roles that involve higher-order thinking, adaptability, and emotional labor. AI itself could play a role in this transformation, offering personalized learning experiences tailored to each individual’s needs, democratizing access to high-quality education.

But here, too, lies the risk of deepening social divides. In societies where access to AI-driven education and tools is unequal, those without access could fall further behind, exacerbating existing inequalities. Just as the Industrial Revolution created a divide between factory owners and workers, the Intelligence Age could create a divide between those who can harness AI and those left behind by it.

Geopolitical Repercussions: AI as the New Global Power

AI’s influence won’t stop at the borders of the workplace or the classroom. On the global stage, AI is becoming a tool of geopolitical power, much like nuclear energy or oil once were. Nations that dominate AI research and control the infrastructure that powers it are positioning themselves for economic and military supremacy. I’ve observed how governments and tech companies alike are racing to develop AI systems that can provide strategic advantages in everything from cybersecurity to autonomous weapons systems.

In many ways, the AI arms race is already underway. Countries with access to vast computational resources and data ecosystems are outpacing those without, setting the stage for a new kind of geopolitical tension. Control over AI isn’t just about economic productivity; it’s about national security, surveillance capabilities, and even the ability to shape global narratives through AI-generated media and information.

But AI’s geopolitical influence also presents opportunities for collaboration. Shared AI resources could help nations address global challenges, from climate change to pandemics. AI models are already being used to optimize energy grids, predict environmental changes, and model disease spread. These applications offer a glimpse of how AI, if managed collaboratively, could lead to collective global benefits rather than competitive domination.

However, without global standards and ethical frameworks, AI’s rise could also deepen global inequalities. Less developed nations, without access to AI infrastructure, could find themselves dependent on AI superpowers, much like resource-poor countries relied on industrialized nations in the past. Navigating this new world order will require not only technological innovation but also diplomacy and international cooperation.

Ethical Imperatives: Navigating the Future of AI

In all of this, one thread remains constant: the ethical responsibility of those who build, deploy, and control AI systems. Throughout my career, I’ve seen the tension between innovation and regulation, between pushing the boundaries of what AI can do and ensuring it is used responsibly. As AI becomes more integrated into society, these ethical questions become even more pressing.

Bias in AI systems—rooted in the data they are trained on—can perpetuate and even amplify social inequalities. Privacy concerns grow as AI systems rely on vast amounts of personal data to function effectively. These challenges require more than just technical solutions; they demand a shift in how we think about transparency, accountability, and fairness in AI development.

Ethical AI is not just a technical problem but a societal one. Ensuring that AI benefits humanity as a whole, rather than entrenching the power of a few, will require new regulatory frameworks and governance models. This is not about slowing innovation but about guiding it in a way that aligns with our collective values.

Conclusion: The Next Leap in Human Evolution

Just as the Cognitive Revolution set Homo sapiens apart from other species, the Intelligence Age represents a new chapter in human evolution. AI is not just another tool—it is a new form of intelligence that can augment and, in some cases, replace human decision-making. But as with every leap forward, the stakes are high. How we manage this new form of intelligence will determine whether the Intelligence Age becomes an era of unprecedented human flourishing or deep social and economic disruption.

The journey from the Cognitive Revolution to the Agricultural Revolution, to the Industrial Age, and now to the Intelligence Age has always been one of adaptation. Our species has thrived because of our ability to not only develop new tools but to reshape our societies around them. The question now is not whether AI will change the world—it undoubtedly will—but whether we are prepared to shape that change in ways that are equitable, ethical, and sustainable.

As we navigate this new era, we must keep asking the big questions—about labor, value, identity, power, and ethics. Only then can we ensure that the Intelligence Age enhances, rather than diminishes, the human experience. This is the next stage in our evolutionary story, and how we respond will define the future of humanity.

Why We Sleep - Book Review

Sat, 05 Mar 2022 01:56:37 +0000

📚 Rating

4.5 ⭐ / 5 🌟

Why We Sleep: Unlocking the Power of Sleep and Dreams by Matthew Walker

My goodreads rating: 5 of 5 stars
View all my reviews

Review

I started reading this book last year. I read the first three chapters and that gave me a good enough idea that I stopped reading the book and started giving myself full 8 hour uninterrupted sleeping time. Now with a whole year of experiencing regular 8 hour sleep, as I finish this book I feel really good for picking this up in the first place.

Matthew Walker is a professor of neuroscience and psychology, runs a Sleep lab in UC Berkeley, has a twenty-plus-year of research career and the book is grounded in research. The tone at times gets very serious, given the ignorance of sleep in our life, that makes sense. The author tries hard to convince the reader to sleep enough hours and at least based on my experience, they were able to accomplish this goal.

Chapter 1 discuss sleep in general. Chapter 2 goes into circadian rhythm, sleep pressure, affects of caffeine, jetlag, etc. on our sleep. Chapter 3 goes into the details of sleep cycles, research, etc.

Details

Personally, after starting the regular 8 hour uninterrupted sleep schedule, I have been able to wake up without any alarms in exact 7.5 hours daily. I have personally felt the effects of caffeine, as a results I stopped drinking diet coke and switched to decaffeinated coffee beans. Half life of caffeine is 5-7 hours 🤯, don’t drink caffeine, definitely don’t drink caffeine after 2-3 pm. The temperature schedule in thermostat to be lower at night and higher around the wake up time has also been super useful.

Sleep Architecture

Understanding the architecture of sleep also comes in handy (Figure 8), a hypnogram. We sleep in 1.5 hour cycles where initial cycles are mainly filled with NREM sleep and the later cycles are filled with REM sleep. We typically go into the waking stage around 3, 6 and 7.5 hour mark.

NREM sleep performs the work of weeding out and removing unnecessary connections. REM sleep is when we save our learning, strengthen the neural connections and dream. The last 1.5 hour section is when we get the maximum amount of REM sleep. This also means sleep hours aren’t equally divided, you can’t wake up early and finish your sleep by taking a nap. After fixing my sleep deficiency, I almost always wake up around the 6 hour mark (4*1.5hr) and I understand now and sleep more to finish the last 1.5hr part of the sleep cycle.

Sleep Cycle

Understanding the 24 hour circadian rhythm of the superchiasmatic nucleus and the sleep pressure of adenosine has also come in handy (Figure 4, 5, 6). There’s genetic evidence for the chronotype aka night owl vs morning larks. It’s important to understand which one are you, not everyone benefits from waking up at 5 am, everyone definitely benefits from sleeping full 5 sleep cycles. Also, you can’t fix your sleep schedule by staying awake the entire night, trust me I’ve tried, the book explains why. You have to shift your sleep schedule by slowly sleeping 0.5/1 hour earlier every day.

Other

There were some great tips along those later chapters like what to do when you are sleepy and you have to drive, answer: DON’T. There are tips about how to reduce sleep pressure but in general avoid driving in such situation. Drowsy driving vehicular accidents exceed those caused by alcohol and drugs. It’s crazy that we don’t see enough warning about lack of sleep as much as we see for alcohol.

After reading the chapter about age and sleep, I started recommending my parents to also start sleeping full 8 hours.

How to consume?

It is great to combine this with ebook and audiobook. All the chapters are self contained and can be read in any order. Last few chapters are great candidates for listening via audiobook.

Scribd offers subscription service where you can read books, audiobooks and a lot more. You can find books and audiobooks for Why We Sleep on Scribd. You can use this promo link to get free 60 days of subscription.

What worked for me to get good sleep

Understand circadian rhythm and sleep pressure relation.
Lower temperature of the room to around 20C.
Stick to a sleep schedule, even if you can’t, still give yourself 8 hours. iPhones have a great sleep schedule + DND feature.
Sleeping 8 hours in parts is not the same as sleep for continuous 8 hours. First half is filled with NREM sleep, second half is filled with REM sleep.

Twelve Tips for Healthy Sleep

These are the twelve tips from the book apendix.

Stick to a sleep schedule. Set an alarm for bedtime.
Try to exercise at least thirty minutes on most days but not later than two to three hours before your bedtime.
Avoid caffeine and nicotine. Coffee, colas, certain teas, and chocolate contain the stimulant caffeine, and its effects can take as long as eight hours to wear off fully. Therefore, a cup of coffee in the late afternoon can make it hard for you to fall asleep at night.
Avoid alcoholic drinks before bed. Having a nightcap or alcoholic beverage before sleep may help you relax, but heavy use robs you of REM sleep, keeping you in the lighter stages of sleep.
Avoid large meals and beverages late at night.
If possible, avoid medicines that delay or disrupt your sleep.
Don’t take naps after 3 p.m. Naps can help make up for lost sleep, but late afternoon naps can make it harder to fall asleep at night.
Relax before bed. Don’t overschedule your day so that no time is left for unwinding. A relaxing activity, such as reading or listening to music, should be part of your bedtime ritual.
Take a hot bath before bed. The drop in body temperature after getting out of the bath may help you feel sleepy, and the bath can help you relax and slow down so you’re more ready to sleep.
Dark bedroom, cool bedroom, gadget-free bedroom. Get rid of anything in your bedroom that might distract you from sleep, such as noises, bright lights, an uncomfortable bed, or warm temperatures. You sleep better if the temperature in the room is kept on the cool side.
Have the right sunlight exposure. Daylight is key to regulating daily sleep patterns. Get in natural sunlight for at least thirty minutes each day, wake up with the sun.
Don’t lie in bed awake. If you find yourself still awake after staying in bed for more than twenty minutes or if you are starting to feel anxious or worried, get up and do some relaxing activity until you feel sleepy.

Quotes

“There does not seem to be one major organ within the body, or process within the brain, that isn’t optimally enhanced by sleep (and detrimentally impaired when we don’t get enough)”
“The uneven back-and-forth interplay between NREM and REM sleep is necessary to elegantly remodel and update our neural circuits at night, and in doing so manage the finite storage space within the brain. Forced by the known storage capacity imposed by a set number of neurons and connections within their memory structures, our brains must find the “sweet spot” between retention of old information and leaving sufficient room for the new. Balancing this storage equation requires identifying which memories are fresh and salient, and which memories that currently exist are overlapping, redundant, or simply no longer relevant.”
“A key function of deep NREM sleep, which predominates early in the night, is to do the work of weeding out and removing unnecessary neural connections. In contrast, the dreaming stage of REM sleep, which prevails later in the night, plays a role in strengthening those connections.”

Excerpt From Why We Sleep: Unlocking the Power of Sleep and Dreams Matthew Walker This material may be protected by copyright.

Autonomous and Electric Vehicle Space

Sun, 28 Feb 2021 04:55:46 +0000

A quarter of the global energy-related greenhouse gas emissions are from transit vehicles. With global carbon emissions on the rise, electric cars are going to be a crucial part of the future. Self-Driving cars are going to be more efficient and spend less time/fuel in traffic. While reading Bill Gate’s new book - ‘How to Avoid a Climate Disaster: The Solutions We Have and the Breakthroughs We Need’, I’ve been researching about the Self-Driving and Electric Vehicle market space and its economics.

Market Leaders

Tesla clearly has a first-to-market advantage, but it is not the only savior and by all metrics it is highly overvalued. No one has capitalized more on the EV hype than Tesla. $TSLA has more market cap as the next 9 major car manufacturers combined, the likes of $F, $GM, $TM, etc.

Currently, Ford and GM are the biggest competitors with multiple electric vehicles coming out soon. Ford is coming out with an electric model of its extremely famous F150 by end of 2021. GM plans to sell only electric cars by 2035. Ford plans to be all electric cars in Europe by 2030. Several other successful auto-makers with a proven track record have plans to build more and more electric vehicles - Daimler/Mercedes-Benz, BMW, Jaguar.

Challenges

To make the EV’s a de-facto standard, there are several challenges that need to be solved.

Energy Density Problem

Interestingly, EV powertrains are a lot cheaper and simpler than ICE powertrains (Internal Combustion Engine), also means less maintenance. Battery energy density is the biggest hurdle to successfully replacing fossil fuel powered vehicles. It becomes impractical to build large/high-range electric vehicles because the weight of battery packs just makes the vehicle too heavy.

To make a practically successful electric vehicle, you need to match the range that gas-powered vehicles provide. Gas-powered vehicles might be less efficient at utilizing the fuel completely but equivalent battery cells required for same range weigh a lot more.

Panasonic is the leading battery cell manufacturer and energy-density is a limitation that all EV car manufacturers face. That’s the reason why major breakthrough EV adoption has been in the short-to-medium range vehicles and why long-haul electric truck manufacturing is harder.

Charging

No one wants to wait for an hour every few 100 miles, so you need more charging stations and to charge batteries faster. To ensure that users can drive long range without having to worry about charging, a good quality the charging station infrastructure is needed. Charging stations cost a lot of money to build, making it difficult to cover larger areas. Lack of a standard charging plug is no help.

China is going to be a big market for EV manufacturers. Nio has approached the charging problem from a unique angle in China - Battery-as-a-Service subscription model. They are building cars with battery-swapping capability and charging-stations that can swap battery pack in just 5 minutes. Starting in Nov 2018, it has already completed a million swaps by end of 2020.

Autonomous Vehicle

Driving is a very tedious and pretty lack-luster task, especially on long routes. Given a choice, I’d rather read a book than to drive myself. Still, you need to get from point A to point B somehow. Humans make mistakes, they lose attention, get tired. So much so that human error play a part in 90% of crashes. Machines on the other hand can do this task without getting tired at all.

Attempt at autonomy aren’t new but recent advances in sensors, computing power and machine learning are making is more possible now. Lex Fridman has a very good lectures on deep-learning for self-driving vehicles and deep-learning.

Autonomous vehicles are going to be the standard in the future. Self-driving cars are going to change society more than anything else in recent past. Attaining semi-autonomy is withing reach but fully autonomy will require significant advances.

With all level-5 self-driving vehicles on road, car crashes will be a thing of the past. Traffic can also be significantly reduced by coordination of fully autonomous vehicles driving on road.

According to Navigant’s research, Ford and GM are among the leaders in the race for autonomous vehicle tech. TSLA has hype on its side, F & GM have experience, scale, higher earnings/profits and likely the tech too.

Apple is teaming with Hyundai but it’s still years away from production.

The primary intent of owning a car is to go from point A to point B. With autonomous taxi service, you can reduce the cost of transportation significantly, something which hasn’t changed much since the first Ford rolled out of assembly line. Google’s Waymo has teamed up with Chrysler and it is targeting ride-hailing market, already offering driverless rides in Arizona.

Conclusion

Electric Vehicles need to be the go-to choice for consumers. EV is a hot space with a lot of hype and upcoming competition. There are several other areas that need innovation to make electric cars the de-facto choice. Tesla doesn’t have all the answers to the problems our planet faces. It was a driver for EV growth and helped move the legacy auto-makers. A fully autonomous shared ride-hailing vehicle fleet can reduce the total cost of ownership, dramatically reduce the cost of transportation and reduce traffic/congestions. It is going to be one of many options, we need more innovation and for it to happen sooner!

Deep Learning for Image Classification

Sat, 02 Feb 2019 18:37:40 +0000

Introduction

At Zomato, we manage more than half a billion images, which are integral to various aspects of our platform. Every day, we process close to 100,000 new images, contributing to petabytes of data, with a daily influx of approximately 500 GB of fresh visual content.

In this blog, we delve into how we built a neural network-based machine learning model to classify these images into categories like food, ambiance, menu, and more. This post is particularly valuable for professionals at startups deploying their first machine learning use case, students eager to learn how to turn innovative ideas into practical solutions, and anyone interested in the technical nuances of scaling deep learning models in a production environment.

What You’ll Learn

This post is driven by our journey of taking a deep learning model from concept to production, addressing the unique challenges of operating at Zomato’s scale. Whether you’re a data scientist, an engineer refining your deployment strategies, a tech enthusiast, or a professional exploring the practical applications of AI, the insights shared here are relevant across various stages of a machine learning project.

By the end of this blog, you will have gained:

Comprehensive Understanding of Image Classification: Explore how we systematically approached the categorization of vast amounts of visual data into meaningful categories using neural networks.
Valuable Practical Insights on Deployment: Understand the challenges we faced when deploying our first deep learning model at scale, and learn how we successfully navigated these obstacles.
Key Lessons from Real-World Implementation: Discover practical lessons from our experience deploying this model in production, including reflections on what we would do differently if we were to undertake this project today.

Whether you’re just starting out, looking to refine your machine learning deployment process, or simply interested in the application of deep learning at scale, this blog post offers actionable insights to help you navigate the complexities of building and deploying AI models effectively.

The Need for Image Classification

As a restaurant search, discovery, and delivery platform, Zomato’s primary sources of images are:

User-Uploaded Images: These are images uploaded by users when they visit or order from a restaurant and write reviews.
Team-Collected Images: These are images our team gathers from restaurants while listing them on the platform.

Image classification serves several critical functions at Zomato:

Enhanced User Experience: By categorizing images into collections such as food and ambiance, we can help users quickly find ambiance images. Previously, we manually tagged around 10-20 images per restaurant as food or ambiance shots. To enhance the user experience, we aimed to categorize all images uploaded across the platform.
Content Balance: The majority of images uploaded to Zomato are food-related, which can overshadow ambiance images. Classifying images allows us to surface ambiance shots more effectively, improving the visual balance of restaurant pages.
Content Quality Assurance: The quality of content on our platform is paramount. We have a dedicated team of moderators who work tirelessly to ensure that only the best content is showcased to our users. Automated tagging, such as identifying human faces or selfies, can significantly improve our photo moderation turnaround time.
Menu Management: Similarly, if an image appears to be a menu, we want our content team to review it to ensure only the highest quality menu images—those manually verified by our data collection team are shown to users.

Building the Classifier

Image classification is fairly straightforward from a technical standpoint, especially when working in a Jupyter notebook. However, our challenge was magnified by the fact that this was our first deep learning project to be deployed in production, and the scale was daunting. We needed to build a system capable of moderating nearly half a million images daily. The initial model was trained in 2016, and this blog post not only recounts our experience from that time but also provides insights into how we would approach retraining today.

To streamline the entire process—from data gathering to preprocessing, model training, and validation—we utilized Luigi. Luigi allowed us to create a DAG-based pipeline, ensuring that each step was dependent on the completion of the previous ones. This approach was crucial for maintaining the integrity and flow of the pipeline. Luigi also provided a user-friendly visual interface, which made it easier to monitor the progress of our data and model pipeline.

Dataset Gathering

Before we could demonstrate the effectiveness of this “new deep learning” approach to our PMs, we needed a substantial amount of labeled data. We started with four primary labels: food, ambiance, menu, and human. In the future, we planned to expand these categories to include indoor shots, outdoor shots, drinks, and dishes.

Food & Ambiance

At Zomato, we had manually tagged images classified as food and ambiance shots. We downloaded 50,000 images for each category to build our classification dataset.

Generating the dataset for menus was the most straightforward task. Given Zomato’s vast collection of manually tagged and categorized menus (one of the foundational elements of the company), we downloaded 50,000 menu images from S3, distributed across randomly selected restaurants.

Humans

Curating the dataset for humans was more challenging. We initially used the YouTube dataset, which includes images with mixed scenes. For example, some images contain humans, but they might also exhibit characteristics of an ambiance shot, leading to potential misclassifications. Our strategy was to train a basic model with this dataset, generate approximate labels, and have our internal moderation team quickly correct them—significantly speeding up the labeling process compared to starting from scratch.

To address the need for face shots, which were limited in the YouTube dataset, we incorporated the LFW dataset by UMass, also known as the Labeled Faces in the Wild dataset.

Dataset Preprocessing

After gathering the data, our next step was preprocessing. We had a large collection of images categorized into food, ambiance, menu, and human. For model training, it was essential to iterate over this data efficiently and feed it into Keras.

To handle this, we used the Hierarchical Data Format (HDF5) to create an out-of-memory iterable dataframe. With the pythonic interface provided by h5py, we could slice and manipulate terabytes of data as if they were numpy arrays in memory.

We resized each image to 227x227 pixels and performed several cleaning steps. Additionally, we augmented the dataset by creating multiple variations of each image through rotation, scaling, zooming, and cropping. In future retraining efforts, we plan to explore using the RecordIO format for storing images in classification tasks.

Training the Model

We began our journey with AlexNet, a well-established model in 2016 with multiple open source implementations available. Alongside AlexNet, we experimented with other architectures like Inception v3 and Google LeNet. While these models served us well at the time, today there are more accurate and efficient options available, such as ResNet, MobileNet, and others.

We chose Keras as our framework due to its flexibility, particularly its ability to switch backend engines (e.g., Theano, TensorFlow) in the future. In 2016, installing TensorFlow wasn’t as straightforward as it is today (pip install tensorflow), so we opted for Theano as our backend engine. Theano provided reliable and consistent results and was easier to set up with Keras during that period. Although Keras remains our preferred choice for writing models, if we were to do this now, we would leverage a platform like AWS Sagemaker for training.

We initially trained our models on in-house GPU servers before transitioning to AWS GPU p2.xlarge instances to scale our efforts. Rather than using transfer learning on an existing ImageNet model, we trained our models from scratch to better fit the unique characteristics of our restaurant industry domain photos. We worked with 50,000 images for each of our four classes: food, ambiance, menu, and human. As illustrated in the graph below, our efforts resulted in achieving approximately 92% validation accuracy.

Production Deployment

For serving the model, we developed an internal API using Flask. We enhanced it with authentication layers and deployed it within our internal VPC network. While today, tools like ONNX and TensorFlow Serving are commonly used for model inference, back in 2016, the landscape for ML model inference was still maturing. As a result, we chose to proceed with a Flask-based API.

We containerized the API using Docker, with a Miniconda3 base image. After every code merge, Jenkins would run unit tests and build the final Docker image, which included both the application code and the latest version of the model. Automated tests were then executed on this image to validate the inference accuracy on a predefined set of images. Once these tests passed, the Docker image was deployed to AWS Elastic Beanstalk, where the API could automatically scale based on incoming request load.

Once the API was live, every time an image was uploaded to Zomato, it was queued for processing. Multiple workers would pick the image from the queue, request inference scores from the API, and save these scores in our database.

Initially, we utilized this setup on the backend for moderation and various other internal use cases. On the product side, we made this live for Food & Ambiance classification. It was first integrated into our web platform, with upcoming releases soon adding it to our mobile apps. The image below illustrates the impact of using image classification, showing the results before and after its implementation.

This example highlights how image classification can make it easier to find ambiance shots, especially when the initial images on the restaurant page are predominantly food shots.

Evolution

From our first model, we learned to streamline our data gathering and model training processes significantly to reduce the TAT from an idea to the model generation, reducing time-to-deployment. Future blog posts will cover our evolving ML training processes and other models in production. Stay tuned for updates.

We are rapidly expanding our machine learning team and have grown in number by 5x in just last year. Check out our careers page if you’re interested in joining us.

Use In-Memory Buffers to Avoid Disk IO

Mon, 12 Nov 2018 18:29:16 +0000

At Zomato, we handle a vast number of images, with close to a hundred thousand new images daily. Often, we need to download, process, and then pass these images to our models. The traditional workflow involves fetching an image from a URL, saving it to a file, and then passing that file path for further processing.

Traditional Workflow

import logging
import os
import tempfile

import cv2
import requests

def download_image(url):
    logging.info('Downloading image from url: %s', url[:100])
    response_object = requests.get(url)
    file_descriptor, filename = tempfile.mkstemp(prefix='image-', suffix='.jpg')
    logging.info('Saving file: %s', filename)
    with open(file_descriptor, mode='wb') as f:
        f.write(response_object.content)
    return filename

url = 'https://chaudhary.page.link/test-zomato-img'
image_path = download_image(url)

img = cv2.imread(image_path)
resized_img = cv2.resize(img, (299, 299))
# preprocess(resized_img)
# prediction_score = model.predict(resized_img)
os.remove(image_path)

While this approach works for a few images, it creates significant unnecessary disk IO when processing millions of images at Zomato’s scale. Additionally, in a dockerized environment, it results in numerous temporary files.

Optimized Workflow with In-Memory Buffers

To eliminate unnecessary disk IO, we can use in-memory buffers. In Python, io.BytesIO allows you to create a buffer in RAM, which can be used like a file pointer and is automatically deleted when closed or goes out of context when using context manager.

from io import BytesIO

import cv2
import numpy as np
import requests


url = 'https://chaudhary.page.link/test-zomato-img'
response_object = requests.get(url)
image_data = BytesIO(response_object.content)
file_bytes = np.asarray(bytearray(image_data.read()), dtype=np.uint8)
img = cv2.imdecode(file_bytes, cv2.IMREAD_COLOR)
image_data.close()
resized_img = cv2.resize(img, (299, 299))
# preprocess(resized_img)
# prediction_score = model.predict(resized_img)

Using imdecode, we can simplify the process further, eliminating the need for a bytes IO buffer.

import cv2
import numpy as np
import requests

url = 'https://chaudhary.page.link/test-zomato-img'
response_object = requests.get(url)
file_bytes = np.asarray(bytearray(response_object.content), dtype=np.uint8)
img = cv2.imdecode(file_bytes, cv2.IMREAD_COLOR)
resized_img = cv2.resize(img, (299, 299))
# preprocess(resized_img)
# prediction_score = model.predict(resized_img)

Performance Analysis

To analyze the performance of these methods, I conducted a simple test. Here are the results on my system:

With File IO: 35.4 ms ± 2.07 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
With Bytes IO: 35.1 ms ± 3.05 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
With Direct Decode: 34.6 ms ± 1.74 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

The Bytes IO reduce unnecessary disk IO, which isn’t measured in this test, even though the performance difference is minimal. Splitting the process into multiple scripts and adding strace can help see the number of OPEN calls, which will be lower in the in-memory methods.

You can find the code to generate these performance numbers here. Let me know if you achieve similar results.

Conclusion

Using in-memory buffers can significantly optimize image processing workflows by reducing disk IO. This approach is especially beneficial at large scales, such as at Zomato, where it can lead to considerable performance improvements and resource savings.

KDE Containerization Talk at Akademy 2018

Thu, 27 Sep 2018 20:02:31 +0000

This July I got the opportunity to be a part of the biggest gathering of KDE developers - Akademy 2018. The akademy conference gathers hundereds of KDE developers together for almost an entire week.

It was held at TU Wien (Techincal University of Vienna) in the beautiful city Vienna, Austria from Saturday 11th to Friday 17th August 2018.

The akademy conference as usual has 2 days of talks by KDE contributors followed by the rest of the week comprising of BoF informal sessions, team outing and a lot more.

Talk: Containerizing KDE

At the conference Anu amazing enough to let me be a part of her talk. You should definitely go subscribe to Anu’s blog. We presented a talk regarding containerization of KDE applications.

In this talk we discussed various containerization techniques. We also demonstrated how containerization of KDE can be useful for developers and end users.

Overview

Setting up a development environment for a software can be time-consuming and at times a bit confusing. There are many libraries and packages that need to be installed and which might also cause conflict with the existing system packages. There are various ways to containerize an application, we discussed two major approaches - Docker and Flatpak.

Docker

Docker helps a developer by setting up a sandboxed development environment in a container which can be used for debugging, testing or developing a new feature. You can run multiple such environments in parallel e.g. stable & development environment.

Installing Docker

sudo apt-get install \
    apt-transport-https \
    ca-certificates \
    curl \
    software-properties-common
    
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
sudo add-apt-repository \
   "deb [arch=amd64] https://download.docker.com/linux/ubuntu \
   $(lsb_release -cs) \
   stable"
sudo apt-get update
sudo apt-get install docker-ce

sudo docker run hello-world

You can checkout more specific information on docker website here

Running KDE applications using docker

KDE Neon is a project focused on building tooling to make it easy to run KDE applications on docker.

wget https://cgit.kde.org/docker-neon.git/plain/neondocker/neondocker.rb
chmod +x neondocker.rb
sudo gem install docker-api
sudo apt-get install ruby-dev

./neondocker.rb okular

You can find out more information here about KDE Neon Dockerization.

Flatpak

Flatpak provides a sandbox environment in which users can run applications in isolation from the rest of the system. Flatpak is tightly coupled with linux and mainly focuses on bundling and sandboxing of desktop applications on linux hosts.

Installing Flatpak

sudo add-apt-repository ppa:alexlarsson/flatpak
sudo apt update
sudo apt install flatpak
    
sudo apt install gnome-software-plugin-flatpak
flatpak remote-add --if-not-exists flathub https://flathub.org/repo/flathub.flatpakrepo

# sudo reboot now

You can checkout more specific information on flatpak website here

Running KDE applications using flatpak

There is a wide list of KDE applications available via flatpak. To run an application like Okular, you need to run just one command:

flatpak remote-add --if-not-exists kdeapps --from https://distribute.kde.org/kdeapps.flatpakrepo
flatpak install kdeapps org.kde.okular

You can find out more information about KDE and Flatpak here

Come be a part of the KDE community :)

Coding is not the only way to contribute to KDE. You can find out many many different ways in which you can contribute to KDE. I can name like 10 things:

Bug Reporting
Bug Triaging
Donation
Translation
Visual and Human Interface Design
Documentation
Promotion
Accessibility
Development
Add your project to KDE Incubator

Checkout the community wiki for more information about contributing to KDE.

Models - Support Vector Machine

Sun, 03 Jun 2018 18:30:00 +0000

Classifying data is a common task in machine learning. Suppose some given data points each belong to one of two classes, and the goal is to decide which class a new data point will be in. In the case of support vector machines, a data point is viewed as a p-dimensional vector (a list of p numbers), and we want to know whether we can separate such points with a (p-1)-dimensional hyperplane. This is called a linear classifier.

There are many hyperplanes that might classify the data. One reasonable choice as the best hyperplane is the one that represents the largest separation, or margin, between the two classes. So we choose the hyperplane so that the distance from it to the nearest data point on each side is maximized.

If such a hyperplane exists, it is known as the maximum-margin hyperplane and the linear classifier it defines is known as a maximum margin classifier; or equivalently, the perceptron of optimal stability.

Usages

Classification
Regression

Pros

Accuracy
Works well on smaller cleaner datasets
It can be more efficient because it uses a subset of training points

Cons

Isn’t suited to larger datasets as the training time with SVMs can be high
Less effective on noisier datasets with overlapping classes

Maximum Margin

Refer to this lecture by MIT OCW

    # Decision rule
    w.u + b >= 0  # Then Positive

    # Constraints
    w.x+ + b >= 1
    w.x- + b <= -1

    y (w.x + b) >= 1

    y (w.x + b) - 1 == 0  # for gutter


    # Width
    (x+ - x-) . (w/|w|) = 1-b + 1+ b / |w| = 2/|w|

    # Min (|w|) - min(1/2 |w|2)

    # Minimization expression only depends on (xi.xj)

Kernel Trick

In addition to performing linear classification, SVMs can efficiently perform a non-linear classification using what is called the kernel trick, implicitly mapping their inputs into high-dimensional feature spaces.

How to Select Support Vector Machine Kernels

When to use linear:

When to use rbf:

How rbf did this?

The RBF kernel SVM decision region is actually also a linear decision region. What RBF kernel SVM actually does is to create non-linear combinations of your features to uplift your samples onto a higher-dimensional feature space where you can use a linear decision boundary to separate your classes:

Python

    #Import Library
    from sklearn import svm
    # Assumed you have, X (predictor) and Y (target) for training data set and x_test(predictor) of test_dataset

    # Create SVM classification object
    model = svm.svc(kernel='linear', c=1, gamma=1)

    # there is various option associated with it, like changing kernel, gamma and C value. Will discuss more # about it in next section.Train the model using the training sets and check score
    model.fit(X, y)
    model.score(X, y)

    #Predict Output
    predicted= model.predict(x_test)

Paper Summary - End to End Interpretation of French Street Name Signs Dataset

Wed, 21 Feb 2018 05:30:51 +0000

This is a summary for the research paper End-to-End Interpretation of the French Street Name Signs Dataset.

This is a model that takes the multiple shots of street view signs as input and outputs the name in the format that will be directly shown in google maps. A full end to end model. This includes reading image, parsing text, converting text for google maps standard and combining text from multiple images into the most accurate version. Pretty interesting problem and solution. This is one of the inspiration paper for Tesseract LSTM model.

First of all they broke the street sign transcription (img to text) into a simpler problem for their human moderators. They detected the street signs using a neural network that gave the bounding box of street signs. Then they collected multiple views of same sign using image capture geo coordinates. Then each image was transcribed using ocr, recaptcha and human respectively. OCR gave basic data for recaptcha, humans verifies recaptcha input, incorrect transcriptions were forwarded to humans. They never transcribed the text as it was shown in image, but the was they wanted it to be shown in Google Maps.

Recurrent Model - STREET

Then using this dataset they trained the STREET model (StreetView Tensorflow Recurrent End-to-End Transcription) for the end to end problem, from using a set of 4 views of street sign as input to transcribing the street name to be used in Maps as output.

CNN

Images are detiled into 4 images from single image, 2 convolution with max pooling is applied to reduce dimensions from 150x150 to 25x25.

Text Finding & Reading

Vertically summarizing Long Short-Term Memory (LSTM) cells are used to find text lines. A vertically summarizing LSTM is a summarizing LSTM that scans the input vertically. It is thus expected to compute a vertical summary of its input, which will be taken from the last vertical timestep.

Three different vertical summarizations are done and then combined later:

Upward to find the top textline.
Separate upward and downward LSTMs, with depth-concatenated outputs, to find the middle textline.
Downward to find the bottom textline.

Although each vertically summarizing LSTM sees the same input, and could theoretically summarize the entirety of what it sees, they are organized this way so that they only have to produce a summary of the most recently seen information.

Since the middle line is harder to find, that gets two LSTMs working in opposite directions. Each output from the CNN layers are passed to a separate bi-directional horizontal LSTM to recognize the text. Bidirectional LSTMs have been shown to be able to read text with high accuracy. The outputs of the bidirectional LSTMs are concatenated in the x-dimension, to string the text lines out in reading order.

Character Position Normalization and Combination of individual outputs

All four input images may have text positioned differently, the network is provided ability to shuffle data in x dimension by adding two more LSTM layers - scanning left to right & right to left.

After this a unidirectional LSTM is used to combine the four views of each input image to produce the most accurate text. This is the layer that will also learn the Title Case normalization. A 50% dropout if added b/w reshape for regularization.

Final Network

Paper

Here’s the full research paper with the important parts highlighted by me.

You can download the pdf for free from Scribd.

Spark Streaming

Tue, 25 Jul 2017 19:05:00 +0000

Extract Transform Load (ETL)

ETL process is to fetch data from different types of systems, structure it and save it into the destination database.

Batch

In the case of a batch job, the query will be run on the data saved at source-path and the transformed data will be saved at the destination dest-path.

Streaming

In the case of a streaming job, the query will run on the data continuously from source-path and transformed data will be appended in the destination dest-path again and again as data comes in.

Merging static data (DB) with streaming data

There might be use cases where you want to merge static data (e.g. MySQL) with the streaming data. You can do this as follows:

Executing the Job

Batch Execution

Stream Execution

With the planner’s logical plan, incremental execution plan is generated on top of it

Resources

Apache Spark 2.0: A Deep Dive Into Structured Streaming - by Tathagata Das

Docker 101

Sat, 22 Jul 2017 20:39:00 +0000

When working in multi node environment like Spark/Hadoop clusters, docker diminishes the barrier to entry. By barrier to entry, I mean the need to have a constantly running EMR cluster, when you are still in development phase. With Docker, you can quickly setup a 4-5 node cluster on a single machine and start coding your spark job. You can understand what Docker is and why you would use Docker on these links.

Benefits

You can very easily version control your environment
Barrier to entry for working with clusters (Spark/Hadoop) etc. reduces a lot. You no longer need EMR access to run a cluster which will have a cost associated with it.

Installing Docker

Follow this official guide

Manual

Quickly for ubuntu steps are:

sudo apt-get update
sudo apt-get install \
    apt-transport-https \
    ca-certificates \
    curl \
    software-properties-common
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
sudo add-apt-repository \
   "deb [arch=amd64] https://download.docker.com/linux/ubuntu \
   $(lsb_release -cs) \
   stable"
sudo apt-get update
sudo apt-get install docker-ce
sudo docker run hello-world

With Ansible

You can use following command to use ansible to install docker for you

sudo python2.7 -m pip install ansible \
  && sudo ansible-galaxy install --force angstwad.docker_ubuntu \
  && echo '- hosts: all
  roles:
      - angstwad.docker_ubuntu
  ' > /tmp/docker_ubuntu.yml \
  && sudo ansible-playbook /tmp/docker_ubuntu.yml -c local -i 'localhost,'

Setting up a cluster

Follow this post.

You will be able to run a local spark cluster with 4 commands.

Quick overview:

mkdir spark_cluster; cd spark_cluster

echo 'version: "2"

services:
  master:
    image: singularities/spark
    command: start-spark master
    hostname: master
    ports:
      - "6066:6066"
      - "7070:7070"
      - "8080:8080"
      - "50070:50070"
  worker:
    image: singularities/spark
    command: start-spark worker master
    environment:
      SPARK_WORKER_CORES: 1
      SPARK_WORKER_MEMORY: 2g
    links:
      - master
' > docker-compose.yml

sudo docker-compose up -d

# sudo docker-compose scale worker=2

Extending other images

With docker you can build on top of someone else’s image. For example, here I will extend singularities/spark image, make my custom spark configuration changes, and push the final version to my own docker hub repo.

Pushing your changes to Docker hub

To create a fork from some base repo (singularities/spark), these are the steps

sudo docker run -it singularities/spark  # Run base repo. This will open a shell
# Make your changes to the image in this container
sudo docker login --username=chaudhary --password=lol
sudo docker commit <container ID from docker ps> chaudhary/my-repo-name  # Commit changes
sudo docker tag <image ID from docker images> chaudhary/my-repo-name  # Tag for pull to work properly
sudo docker push chaudhary/my-repo-name

Now that you have pushed this image, you can start a new container from this image as shown below:

sudo docker run -it chaudhary/my-repo-name

Resources

For more information read the official getting started guide.

Develop Freedom

Sapiens at the Crossroads of the Intelligence Age

Economic Shifts: Decoupling Labor from Value

Social Implications: Redefining Work and Human Identity

Geopolitical Repercussions: AI as the New Global Power

Ethical Imperatives: Navigating the Future of AI

Conclusion: The Next Leap in Human Evolution

Why We Sleep - Book Review

📚 Rating

Review

Details

Sleep Architecture

Sleep Cycle

Other

How to consume?

What worked for me to get good sleep

Twelve Tips for Healthy Sleep

Quotes

Autonomous and Electric Vehicle Space

Market Leaders

Challenges

Energy Density Problem

Charging

Autonomous Vehicle

Conclusion

Deep Learning for Image Classification

Introduction

What You’ll Learn

The Need for Image Classification

Building the Classifier

Dataset Gathering

Food & Ambiance

Menu

Humans

Dataset Preprocessing

Training the Model

Production Deployment

Evolution

Use In-Memory Buffers to Avoid Disk IO

Traditional Workflow

Optimized Workflow with In-Memory Buffers

Performance Analysis

Conclusion

KDE Containerization Talk at Akademy 2018

Talk: Containerizing KDE

Overview

Docker

Installing Docker

Running KDE applications using docker

Flatpak

Installing Flatpak

Running KDE applications using flatpak

Come be a part of the KDE community :)

Models - Support Vector Machine

Usages

Pros

Cons

Maximum Margin

Kernel Trick

How to Select Support Vector Machine Kernels

Python

Paper Summary - End to End Interpretation of French Street Name Signs Dataset

Recurrent Model - STREET

CNN

Text Finding & Reading

Character Position Normalization and Combination of individual outputs

Final Network

Paper

Spark Streaming

Extract Transform Load (ETL)

Batch

Streaming

Merging static data (DB) with streaming data

Executing the Job

Batch Execution

Stream Execution

Resources

Docker 101

Benefits

Installing Docker