Geshan's Blog

Getting Started with Google Agent Development Kit (ADK): Build and Run Your Simple Fact-Checker AI Agent

2026-05-30T11:20:47Z

Have you ever tried building an AI agent, only to get bogged down in massive, complex frameworks just to get a basic output? If you want a clean, code-first way to build and debug agents without the boilerplate, Google’s open-source Agent Development Kit (ADK) is what you need. In this post, you will learn how to set up the Python SDK, code your first Gemini-powered agent that checks facts, and test it locally using ADK’s built-in web playground. Let's get started!

Table of contents #

Prerequisites #

Before you get your hands dirty with the code, let’s get the following prerequisites mentioned:

As you will be using Python with Google ADK, you must have Python and uv installed and working. For the demo, we will use uv version 0.11.7 and Python version 3.12
You will need basic knowledge of how Python, pip, and virtualenv work.
Knowledge of Google ADK will be beneficial
You will need a working API key on Google AI Studio, for which you might need a working Google Cloud Platform (GCP) project and a valid GCP account with payment enabled or GCP credit.

In the next section, you will learn about the Google Agent Development Kit (ADK).

Google Agent Development Kit (ADK) #

Google ADK is an open-source agent development framework that enables you to build, debug, and deploy AI Agents at enterprise scale. It is available in multiple languages, including Python, TypeScript, Go, Java, and Kotlin. Based on the ADK samples and examples, Python is the most popular language for building an AI agent with ADK at this point. ADK version 2.0 has been recently released.

In my experience, it is a good framework to build AI Agents. With less than 100 lines of code, you can build something useful and meaningful. For the example used in the tutorial, you will build a simple conversational single-agent tool that can verify facts.

Fact checker agent #

As a demo for this blog post (a tutorial), you will build a fact-checker agent using the Google Agent Development Kit (ADK). The aim is simple: you see a news post, or someone makes a statement, but you are not sure whether it is a fact or a false opinion (even worse, fake news). You can pass that statement to this AI agent, and it will tell you whether it is a fact.

In the next section, get ready to roll up your sleeves and code a simple fact-checker AI agent, then run it locally on your machine.

Build the agent #

To build the fact-checker agent, you can start by running the following commands:

mkdir fact-checker-agent-adk
cd fact-checker-agent-adk

Then you can run,

uv init
# it used Python 3.12 for my example
uv add google-adk

It will initialize the project with uv, and add the Google ADK package to the project with its CLI.

Now you have google adk installed. At the time of writing, the version is 2.1. You can check it by running uv run adk –version.

To create the fact checker agent with the adk CLI, run the command below and answer the questions about the model, whether to use Vertex AI or an API key from Google AI Studio. You will need to execute the following command:

uv run adk create fact_checker

Then answer the questions, which will look like:

For this tutorial, you will use Google AI Studio, and you will need to create an API key on Google AI Studio to continue building the AI Agent.

It will create a ./fact_checker/agent.py file with the following contents:

from google.adk.agents.llm_agent import Agent

root_agent = Agent(
    model='gemini-2.5-flash',
    name='root_agent',
    description='A helpful assistant for user questions.',
    instruction='Answer user questions to the best of your knowledge',
)

Replace it with the following contents to create your fact checker agent:

from dotenv import load_dotenv
from google.adk.agents import Agent
from google.adk.apps.app import App
from google.adk.tools import google_search
from google.genai import types

# Load environment variables from .env file
load_dotenv(override=True)

root_agent = Agent(
    name="Facts",
    model="gemini-flash-latest", #gemini 3.5 at the time of writing
    instruction="""You are a fact checker. 
    You will be skeptical about anything that is said to you. 
    You will search the web and verify the given information 
    if it does not match you will respond with the latest 
    and factual information.""",
    description="An Agent to provide only facts about a given topic using Google Search.",
    generate_content_config=types.GenerateContentConfig(
        temperature=0.1 
    ),
    tools=[google_search],
)

app = App(name="fact_checker", root_agent=root_agent)

The code above is doing:

First, it imports necessary modules like load_dotenv to load secrets, Agent, and app, which are the core building blocks of Google ADK
It also imports the Google Search built-in tool that allows the AI agent to browse the internet for live data and types for advanced configuration settings for the AI model
After that, it loads the environment variable that uses the Google AI Studio API key you put in the last step. If you had used Vertex AI (Google Enterprise Agent Platform now), it would load the GCP project details too.
Then, it defines the AI Agent as root_agent which:
- has a name of Facts
- uses the gemini-flash-latest model, which is Gemini 3.5 at the time of writing this blog post
The instructions give the AI a strict persona. It tells the agent to be inherently skeptical of user input and mandates that it must verify claims using the internet.
Add a relevant description
add a temperature of only 0.1, as for the fact checker agent, it is good to be not very creative with answers and reply back to the point as LLM temperature controls randomness
Then gives it a Google search as a tool to do the fact-checking
Finally, the app is initialized with the name fact_checker, and the root agent is assigned

In the next section, you will run the agent in the CLI and the web, then verify if it is fact-checking the statements correctly.

Run the agent in the CLI #

To run the agent in the CLI, execute the following command:

uv run adk run fact_checker

Then you can verify if a given statement is a fact, like:

Scott Morrison is the prime minister of Australia.

And it would fact-check the given statement like:

Run the agent with web UI #

To run the ADK agent with the web UI, you can run the following command:

uv run adk web fact_checker

Which will result in:

After that, you can open the browser of your choice (like Google Chrome) and go to http://localhost:8000 as the ADK web will by default run on port 8000. You will see something like the below, and you can ask anything to fact-check it, like below:

I asked it about the Capital Gains Tax discussion after the Australian Budget, and it said that it is proposed and has not been converted into law, after doing a couple of Google Searches.

You can see the request and response, and check the metadata of the tokens used, from the icons on the left sidebar of the ADK web UI. You can also run the AI Agent as an API server or Ambient Agent that can respond to asynchronous events without human intervention, such as reacting to cloud events when a file is uploaded to Google Cloud Storage, or running on a schedule.

You can also deploy the agent to a service such as Google Cloud Run and make it available with appropriate access controls.

There you have it, a small but useful AI agent that uses Google Search and verifies if a statement is fact or not. The full app is available for your reference in this open-source GitHub repository.

Conclusion #

The Google Agent Development Kit (ADK) provides a remarkably clean and efficient way to build powerful AI agents without excessive boilerplate. By leveraging the Gemini model alongside the integrated Google Search tool, you successfully created a functional fact-checker agent in just a few lines of code.

Whether you prefer using the straightforward CLI for quick tests or the built-in web UI for a more visual debugging experience, ADK offers a flexible environment for local development and enterprise-scale deployment alike. Keep building!

Using Spec Driven Development with AWS Kiro to add the last updated date on Eleventy blog

2026-05-03T11:54:47Z

After writing a couple of prompts to get a feature done with an LLM or a coding Agent like Claude Code (or Cursor), have you felt like there should be a more declarative way of doing this than taking turns with prompts? Instead of throwing prompts at a coding Agent, herding it/them to do the right job, and getting frustrated, wouldn’t it be better to have a plan of tasks to follow? This is where Spec Driven development comes into the picture, and how AWS Kiro IDE can be used to add a new feature without prompting too many times, so the LLM doesn't stray from the main task. In this post, you will see a practical example of how to use Spec-driven development with AWS Kiro to add the last updated date to this Eleventy blog. Let’s get started!

Table of contents #

Prerequisites #

Before getting into the code, below are some of the things expected to already be in place:

You have Git working locally and can clone repositories from GitHub
AWS Kiro is installed on your machine and is working. You can download it and install it from their website
You can run Node.js version 22 or later on your machine
A basic understanding of how Eleventy blog works (else Kiro can explain that to you too)

Given that, let’s get started by first learning about spec-driven development, the AWS Kiro IDE, and how they are related in the next section.

Spec-driven development #

Spec-Driven Development (SDD) is a methodology that formalizes and brings rigorous discipline to the software engineering process. Instead of jumping straight into writing code or letting an AI jump straight into writing code, SDD requires you to define exactly what needs to be built and how it will be built before any implementation begins. With this context, a spec is an organized, behavior-focused document or a collection of connected documents composed in plain language that defines how software should operate and provides instructions for AI coding agents

If you have worked in enterprise software development, backend engineering, or DevOps, you are likely already familiar with writing technical design documents, Request for Comments (RFCs), or Architectural Decision Records (ADRs).

Spec-driven development brings this disciplined, documentation-first approach directly into your daily coding workflow.

In a typical Spec-Driven Development workflow, the process is broken down into distinct, sequential phases:

Requirements Gathering: This phase involves defining the user stories, the business value, and the exact acceptance criteria. What exactly are you building, and why does it matter to the end-user or the business? This step ensures you are solving the right problem.
Technical Design: Once the requirements are locked in, you outline the technical architecture, data flow, edge cases, and interfaces. How will this new feature integrate with the existing codebase? What libraries will be used? What happens if a database connection fails or a file is missing?
Task Breakdown: This involves breaking the technical design down into small, sequential, and trackable implementation steps. This creates a clear roadmap for the actual coding phase.
Implementation: Finally, you write the actual code, guided strictly by the tasks and the technical design.

When using AI for coding, applying Spec-Driven Development gives a needed advantage. It stops the AI from hallucinating random architectures or using deprecated libraries. By forcing the AI to write the specifications first, you get a crucial opportunity to review, correct, and approve the AI's thought process. Once the specs are locked in, the AI generates code that perfectly aligns with your expectations, drastically reducing bugs and rework. There are other frameworks for spec-driven development, like Open Spec and Speckit by GitHub.

Since the spec is already provided, it is much easier to write tests, such as unit tests or even property-based tests.

In the next part, you will learn about Kiro (or AWS Kiro).

AWS Kiro IDE #

Kiro is an agentic development workspace designed to help engineers efficiently deliver high-quality work alongside AI assistants. It represents AWS’ vision of a development environment built from the ground up to maximize the potential of artificial intelligence. As per their official website:

Kiro helps you do your best work by bringing structure to AI coding with spec-driven development.

Unlike traditional cloud consoles or basic AI coding assistants that just sit in a sidebar, the AWS Kiro IDE acts as a mature engineering partner. It has deep context about your entire workspace, your file structures, and your dependencies. It is also cloud-agnostic, meaning you do not strictly need to deploy your applications to AWS to use it.

The primary philosophy behind AWS Kiro is to elevate the developer from a mere "code typist" to a "system architect." It achieves this ambitious goal by making Spec-Driven Development a first-class citizen within the IDE itself. You are no longer just chatting with an AI; you are collaborating with an AI agent that thinks like a system architect and also writes code as an experienced engineer.

There are other features of the Kiro IDE that might interest you, like the .kiro folder, powers, steering, and hooks. It also has the skills and MCP expected of any agentic, AI-focused IDE. The focus of this blog post is on adding a feature to an existing codebase using Spec-driven development with Kiro.

Adding updated at to an Eleventy blog #

So the feature you want to build for this tutorial is to add updated at to an existing blog post because there have been some updates. For instance, a blog post was written some years ago when Node.js version 20 was the latest LTS, but Node.js 24 is now the latest LTS. So a blog post was created in mid 2023, and now it is updated in mid 2026, as seen below:

Open the blog in Kiro #

To build the feature, you can clone the blog (from GitHub - but please do not use it as a template) and open it in Kiro as follows (File ->Open Folder and select the cloned folder):

Prompt Kiro for the new feature #

Then add a prompt like the one below in the bottom right text box that says Ask a question or describe a task:

I want to add a last updated at to all the blog posts, it will be shown only if it is different 

than the created at. Let's plan and write a spec for that.

After that, push the up arrow purple button to get started. Make sure Spec is selected on the Agent session. After that It will ask you if it is a feature or a bug.

Select Build a feature and then click Submit Answer. Then select Requirements and again click `Submit Answer as seen below:

Kiro create the required docs and tasks #

After that, Kiro will search the workspace for more context and say so. Then it will create a .config.kiro file and write the requirements doc:

In my case, it was created in the .kiro/specs/post-last-updated folder. You can read the requirement doc and ask AWS Kiro to proceed to create the design doc as follows:

It will search the workspace again, write up the design doc (design.md), and show it. Read that one too and ask it to proceed to the task list:

Then it will create a task list too. Notice that it will add optional tasks to write tests, too. You can ask it to implement task 1 as seen below:

Ask Kiro to implement tasks #

It will ask you to run a command at this point, so it is better to have a terminal window open (Terminal -> New Terminal). You can Run the command if you think it makes sense:

In my case, it was installing fast-check to run some tests. Then it will ask you to run more commands. Depending on the command, you can click Run to get the task done. You can also click Follow to see what is happening:

When the task is done, it will tell you. Then you can ask it to implement other task(s). I asked it to implement all the tasks in my list, which was up to 6:

I had to click Run to see what it was doing, but after a few minutes it finished all the tasks. It added a test framework and ran tests to ensure the feature worked as intended. After doing task 6, it finished as follows:

Did the feature work after all these changes?

The feature worked #

I had to make a couple of minor changes, but the feature worked after that. Here is the post written in Jun 2023 with an updated date of May 2026:

If you want to have a look at the spec and all the code changes, it is in this commit.

There you did it, understood how spec-driven development works with AWS Kiro in an existing project.

It is a very good practice to define your requirements in a spec first. In case you have a clear spec, you will get better and more consistent results from any tool, especially AI-powered ones.’

Conclusion #

In this useful guide, you explored how the AWS Kiro IDE enforces Spec-Driven Development, bringing rigorous engineering discipline back to AI-assisted coding. By defining clear requirements and technical designs up front, you eliminate AI hallucinations and ensure the generated code aligns perfectly with your business needs and architectural standards.

You also walked through a highly practical AWS Kiro tutorial, successfully implementing a "last updated" date for an existing Eleventy blog. Embrace structured AI development. Stop writing code first, and start writing specifications first. Your future self, your team members, and your production environment will thank you. The spec is also committed with the code, leaving a needed trail of why things were built. Happy coding!

How to build a simple Google login and profile page on Google AI Studio with Firebase as a datastore [step-by-step]

2026-03-27T10:52:47Z

Google AI Studio has recently added an array of new features, calling it a new full-stack vibe-coding experience and vibe-coding-to-production. The new feature include ability to generate music, use Google Search data, use Google Maps data, add a database and auth, and add Gemini intelligence, to name a few. In this post, you will learn about adding a database and auth, which uses Google Firebase in the background to do so. Let’s get started!

Table of contents #

Prerequisites #

Before you get started with vibe coding a simple Google login and profile page on Google AI Studio using Firebase as a datastore and login provider, it is better to check off the following requirements:

You will need an active Google account; having a Gmail will suffice
You may need a Google Cloud Project or a Firebase project, in my experiment, it was not required, but it is good to have
You will use the Gemini 3.1 Pro Preview, which can eat up your free limits.

Given that is mentioned, you can start building a login and user profile page on Google AI Studio by prompting (and vibe coding) in place of writing code to integrate with Firebase next.

The aim of this tutorial is to build a simple but useful login page with Google login (1.8 billion people use Gmail each month). The app will also have a form to input address, phone number, LinkedIn profile, and GitHub profile that will be displayed on a plain profile page.

You could have used Google Stitch to vibe design it, we can keep it as a topic for another blog post.

Google AI Studio - Build #

First go to Google AI Studio at AI.dev or aistudio.google.com which will look like:

Then, click on Build on the left navigation, which will load the following page:

On this page, click the settings gear ⚙️on the top right:

Then select the Pro model, at the time of writing, that is Gemini 3.1 Pro Preview and make sure Next.js is selected as the framework. You are using the pro model because it is more powerful, and in my experience, it follows instructions better and produces output with fewer or no errors. Then you can close the settings and proceed with the prompt.

The prompt to generate the app #

After that, you will paste the following prompt in the textbox below the text Build your ideas with Gemini:

Create a simple login page, then show the user's profile page with the user's name, email, and a photo from the 
user's Google profile. On that page, the email should be visible only when the eye icon is clicked. 
The user can add and update their address and phone number. The user can also add their LinkedIn and GitHub profiles; 
all of that data, along with the user profile data pulled from the Google profile, is saved in the database. 

After saving the information, it is presented in a profile page. Please keep the form and the profile page separate. 
Please add the necessary validation for the address, phone number, and profile URLs. The application should be built 
in the green theme, and the buttons should be dark green.

and also select Add database and auth, after that it will look as follows:

You can use a different color theme, but for this tutorial, you will use the “green” color. Then you will click the Build button to start generating the app.

Google AI Studio will start generating the app as per the given prompt using the Gemini 3.1 pro model, as seen below:

Then it will ask for permission to use Firebase and to accept its terms, as discussed next.

After generating the basic app, once the model decides to use Firebase following the given instructions in the prompt, Google AI Studio will ask for permission to use Firebase as well as select the region for the project:

Here, you can select the region as Oregon (us-west1) or any other region that makes sense for you, and click the Enable button to continue the app generation process. You can also click the Code button to see the generated code if that interests you.

After I clicked “Enable”, which automatically agreed to the terms and conditions for Firebase, it worked for almost 4 minutes (223 seconds) and generated the app with a Google login. In the background, it also created the necessary Firebase project and Firestore database:

Then you can test the app by clicking the Login to Google button, notice that it adhered to the prompt and created the app with a green theme. It will ask you to continue with your Google account as follows:

After you click continue and log in, it will show you the name and the profile picture on a form:

Notice that it also adheres to the prompt and hides the email address, making it visible only after clicking the eye button. You can click the edit profile button to add the details:

Then you can save the details by clicking the Save Changes button. Keep in mind that your app may look different from the above screenshot as LLMs are non-deterministic. It should then load the profile page as follows:

You can try the app in full-screen mode and try changing the details. You can also log out and log in again.

Data in Firestore #

When all of this is happening, it is no longer a frontend-only prototype app. The data is being saved in Firebase Firestore. To verify that, go to your Firebase console – either a new project would have been created, or a Firestore database would have been added to your exiting firebase project.

The database name would usually start with ai-studio and end with the UUID of the AI Studio project, like ai-studio-0042ecab-af9a-4dfb-b5b2-5e39156b6258. When you peek into the Firestore database, you will see the record saved from your Green Profile app:

There you have it: a full-stack Next.js app generated with Google AI Studio – one-shot with a single prompt that supports Google login and saves data to Firebase Firestore. You can add more features in the chat section of Google AI Studio, like pulling in data from Google Search or Google Maps.

You can also deploy the app to Google Cloud Run using the publish feature (top-right button) and make it available to anyone in the world via an open URL. The possibilities are endless.

Conclusion #

In this guide, you started with a prompt to build a basic login-and-profile-page app using Google AI Studio. You generated the app and tested it. Finally, you could verify that the data was saved properly on Firestore in Firebase. Every app needs a login and profile page. You can use the chat section in Google AI Studio to explore other newly added features, such as retrieving data from Google Search and Google Maps, or even add intelligence with Gemini or image-generation capabilities to this app. Keep building!

Choosing the best git branching strategy for continuous delivery in your team

2026-03-22T10:51:32Z

With AI doing some or most of the code writing (ahm! generation, if I may), being strong in the basics becomes even more crucial. If you have the word "engineer" in your job title, then knowing tools like Git and Docker has become inevitable. In this post, you will learn about the three main Git branching strategies and which one your team should choose for continuous delivery. This post will also cover real-life experiences and recommend a Git branching strategy that has proven more effective for the teams I've been a part of. Let’s get started!

Table of Contents #

Continuous integration and delivery #

As stated in Atlassian’s blog post:

Continuous integration (CI) is the practice of automating the integration of code changes from multiple contributors into a single software project. It's a primary DevOps best practice, allowing developers to frequently merge code changes into a central repository where builds and tests then run.

So the focus is on integrating code changes into a single project or for better understanding a Git Branch.

Similarly Continuous delivery as per Jez Humble is:

Continuous Delivery is the ability to get changes of all types—including new features, configuration changes, bug fixes and experiments—into production, or into the hands of users, safely and quickly in a sustainable way.

In other words, Continuous delivery is the practice of ensuring that your software is always in a releasable state. It means that whenever a feature is finished, tested, and approved, it can be deployed to production with the push of a button.

But how do you organize your Git branches to support this seamless flow? If your branching model is too complex, continuous delivery becomes hard because the code gets stuck in a web of merges. If your branching model is too loose, you risk deploying broken code to your users. In the next section, you will learn about one of the older Git branching strategies, Gitflow.

Gitflow #

When discussing version control workflows, it is impossible not to start with Gitflow. Introduced around 2010 by Vincent Dreisen in his blog post of the same name, it was one of the first formalized, moderately adopted branching models that gave teams a structured way to handle releases. For many years, this was considered a good standard for software development teams. It provided a strict, rule-based approach to managing features, releases, and hotfixes.

In the Gitflow model, the master branch is reserved exclusively for code currently running in production. You never commit directly to master. Alongside it lives the develop branch, which serves as the integration branch for all new features. When you want to build a new feature, you branch off from develop to create a feature/your-feature-name branch. Once your work is done, you merge it back into develop.

The Image is taken from this blog post.

This sounds fine in theory, but the complexity multiplies when it is time to release. Multiple changes from two or more software engineers would have been merged to develop. Big changes = big risk, and all of it goes to production in one go when the develop branch is deployed. After deployment, the develop branch’s changes have to be merged into master.

If a critical bug is found in production, you have to create a hotfix branch off of master, fix the bug, and then merge that hotfix into both master and develop.

Continuous delivery relies on small, frequent, and automated releases. Gitflow, by its very design, encourages batching features together into large, infrequent releases. The overhead of managing multiple long-lived branches, resolving complex merge conflicts between develop and master, and maintaining release branches creates massive friction.

While Gitflow might still make sense for software with scheduled, versioned releases (like desktop applications or mobile apps that require app store approval), it is generally considered not up to the mark for modern web applications and APIs that aim for multiple deployments per day.

GitHub Flow #

As the frustrations with Gitflow grew, a new, much simpler model emerged and quickly took the software engineering world by storm. This model gained massive popularity on GitHub with the introduction of the Pull Request (PR) mechanism. It is aptly named GitHub Flow or simplified Gitflow.

GitHub Flow is beautifully simple. Unlike Gitflow, it operates on the principle that there is only one long-lived branch: master (often referred to as main in modern repositories). The golden rule of GitHub Flow is that the master branch must always be deployable.

When you want to work on a new feature or fix a bug, you create a descriptively named branch directly off of master. You write your code, commit your changes, and push the branch to the remote repository. Then, you open a Pull Request. The Pull Request is the heart of GitHub Flow. It is where code reviews happen, where automated tests are run by your Continuous Integration (CI) server, and where discussions about the architecture and logic take place. Once the code is reviewed and approved, it is merged into master.

The image is taken from this blog post.

Because of its simplicity, GitHub Flow is widely considered an excellent git branching strategy for continuous delivery. However, there are two distinct ways teams handle deployment in this flow, and you must choose the one that aligns with your team's risk appetite.

Deploying before merging #

In this variation of GitHub Flow, the deployment happens before the code is merged into the master branch. When you open a Pull Request, your CI/CD pipeline automatically provisions a temporary environment or deploys the feature branch to a staging server or a preview URL (also known as a deployment preview).

You and your team test the feature branch thoroughly in this environment. Once the CI tests pass, the product manager approves the feature, and the QA checks pass (if any), you deploy that exact feature branch directly to the production environment. Only after the code runs successfully in production and has been verified as stable, do you merge the Pull Request into the master branch. This could be done within minutes of the production deployment or may take some time, depending on the type of change or the feature deployed.

This approach guarantees that the master branch is always 100% stable. If a deployment fails, master is untouched, and you can simply deploy master to roll back. Then, fix the issue on your feature branch and try again. This is a highly resilient way to handle continuous delivery, though it requires sophisticated deployment tooling to deploy from arbitrary branches or tags.

Deploying after merging #

The second, and perhaps more common, variation is deploying after merging to master. In this scenario, once your Pull Request is approved and the automated tests pass, you merge the feature branch into master.

Merging into master triggers your continuous delivery pipeline. The pipeline takes the latest commit on master, builds the application, and deploys it to production. You might choose to automatically create a Git tag for every deployment to keep track of versions.

While this is easier to set up in most CI/CD tools, it comes with a significant caveat: master will not always be stable. If a bug slips through the code review and automated tests, the pipeline will deploy broken code to production. Because the broken code has already been merged into master, the master branch is now broken.

To recover, you must either quickly run the git revert command to undo the merge commit and push the reversion through the pipeline, or roll back the deployment using your infrastructure tools (such as Kubernetes or other traffic-control mechanisms) while you fix the master branch. This requires your team to be highly responsive and have excellent application-level logging and monitoring in place to detect issues immediately.

Rollback would typically involve opening a pull request and merging it to remove the problematic change from master, so other software engineers can merge their work and deploy to production as normal.

Regardless of which deployment variation you choose, GitHub Flow's reliance on a single long-lived branch drastically reduces merge conflicts and cognitive load, making it a useful git branching strategy for continuous delivery.

Trunk-based development #

If GitHub Flow simplifies things by having only one long-lived branch, then Trunk-based development takes the concepts of simplicity and speed to their absolute extremes. In recent years, Trunk-based development has been promoted by DevOps experts and the DORA (DevOps Research and Assessment) reports as the ultimate practice for high-performing engineering teams.

However, there is often a subtle confusion surrounding Trunk-based development. The confusion usually stems from whether it means having literally only one branch (master or "trunk") where everyone commits directly, or doing extremely small, short-lived feature branches that are merged multiple times a day.

The image is taken from Trunk Based Development website.

In practice, modern Trunk-based development usually means the latter. Software engineers create very short-lived branches off the trunk, write a small batch of code, and merge it back into the trunk as quickly as possible—often several times a day. The goal is to completely eliminate the "merge hell" that occurs when long-running feature branches diverge too far from the main codebase. By integrating code continuously, you ensure that everyone is always working on the most up-to-date version of the software. This is where sync code review with pair programming comes into play. In this age of AI written/generated code pair programming might be a dying art form.

When evaluating a git branching strategy for continuous delivery, Trunk-based development is often seen as the holy grail. Because code is integrated so frequently, the continuous delivery pipeline is constantly firing, deploying small, incremental changes to production. This drastically reduces the blast radius of any given deployment. If a deployment breaks something, it is incredibly easy to pinpoint the cause because the changeset is so small.

You cannot simply adopt Trunk-based development by telling your team to merge faster. Because you are constantly merging code into the trunk—even code for features that are only half-finished—using feature flags becomes absolutely essential for Trunk-based development.

Feature flags (or feature toggles) allow you to wrap your new, half-done code in a conditional statement. The code is deployed to production, but the feature flag is turned off, so end users cannot see or interact with the new feature.

For example, if you are building a new checkout flow, you will merge the database migrations on Monday, the backend API routes on Tuesday, and the frontend UI components on Wednesday. All of this code goes to production immediately via your continuous delivery pipeline. However, because the feature flag is disabled, customers continue to use the old checkout flow. Once the entire feature is complete and tested in production by internal staff, you simply flip the switch on your feature flag dashboard, and the new checkout flow is instantly available to your users.

Trunk-based development requires a highly mature engineering culture. You must have an extensive suite of automated tests (unit, integration, and end-to-end) that run within a reasonable time. If your test suite takes an hour to run, you cannot merge multiple times a day. You also need excellent monitoring, fast build times (utilizing tools like Docker multi-stage builds), and a team that deeply understands how to break large tasks into tiny, independent, release-ready parts.

If your team lacks this level of maturity, attempting Trunk-based development will likely result in broken builds, frustrated engineers, and a chaotic production environment.

GitHub flow strikes the right balance #

You have now explored the three main contenders for your team's workflow. Gitflow is heavy and bureaucratic, hindering rapid deployment. Trunk-based development is incredibly fast and efficient, but it requires a level of engineering maturity, testing infrastructure, and feature flag management that many teams simply do not possess yet.

When choosing a Git branching strategy for continuous delivery, you need a process that provides structure without bottlenecks and speed without chaos. This is exactly why GitHub flow strikes the right balance.

You can think of GitHub Flow as existing in the "Goldilocks zone" of version control strategies. It is not too hot or too cold; it is just right.

GitHub Flow provides enough structure to ensure code quality. By utilizing Pull Requests, you enforce a mandatory code review process. This ensures that at least one other software engineer looks at the code, checks for architectural issues, and verifies that the automated tests have passed before anything is merged. This serves as a crucial safety net, preventing bad code from reaching your users.

At the same time, GitHub Flow is lightweight enough to support true continuous delivery. Because there is only one long-lived branch, developers do not have to navigate a maze of develop, release, and hotfix branches. When a feature is ready, it is reviewed, merged, and deployed. The cycle time from writing code to delivering business value is kept short.

For most teams—especially those building web applications, SaaS products, or microservices—GitHub Flow is the best fit.

It allows junior engineers to work safely within feature branches without the fear of breaking the trunk. It allows senior engineers to enforce quality standards through PR reviews. And it allows the operations and DevOps personnel to build straightforward, predictable CI/CD pipelines that trigger deployments based on actions taken on the master branch.

Furthermore, GitHub Flow does not strictly prohibit the use of feature flags. In fact, as your team matures, you can easily incorporate feature flags into your GitHub Flow process. You can start by building smaller feature branches and merging them more quickly, gradually moving your team's culture closer to Trunk-based development without completely overhauling your branching strategy overnight.

By choosing GitHub Flow, you are adopting a pragmatic, battle-tested methodology that aligns perfectly with the goals of agile software development and continuous delivery. It removes the unnecessary overhead of legacy models while providing the safety mechanisms needed to protect your production environment.

Conclusion #

Software engineering is a complex discipline, and delivering reliable software to your customers quickly is one of the hardest challenges a team faces. The tools and processes you choose should empower your developers, not hinder them.

In this guide, you have explored the landscape of version control workflows. You learned that while Gitflow was a pioneer, its multiple long-lived branches make it too cumbersome for modern, rapid release cycles. You discovered that Trunk-based development offers incredible speed and integration, but demands a highly mature testing culture and strict reliance on feature flags.

Ultimately, finding the right git branching strategy for continuous delivery comes down to evaluating your team's current capabilities and your business needs. For the vast majority of software engineering teams, GitHub Flow provides the perfect equilibrium. It leverages the power of Pull Requests for code quality while maintaining a single, deployable master branch to enable fast, frictionless releases.

Remember, your branching strategy is a means to an end. The ultimate goal is to deliver value to your users consistently and safely. Choose the strategy that keeps your team productive, your codebase clean, and your deployments boring and predictable. Happy coding and deploying!

How to use an open model with your application using Docker Model Runner and Docker Compose [Part 2]

2026-01-25T10:53:32Z

You can run open models with other apps, such as Ollama. Docker Model Runner shines when you want to connect your application’s Docker container with an open model. It feels more native to Docker to define both the application and the model in a single Docker Compose file. You will learn to do so in this tutorial with a demo app built with Node.js that talks to Smollm2, defined in a Docker Compose file. Let’s get going!

Table of contents #

Prerequisites #

In this part, similar to part 1 about Docker Model Runner, you will need Docker Desktop installed with Docker Model Runner available. You will also need a decent hardware configuration to run the models, especially the ones with billions of parameters. On top of that, the following things will also be needed:

Docker compose installed on your machine. Compose is bundled with Docker Desktop if it is not installed, please install it.
A general idea of how Docker Compose works and how services communicate in Docker Compose would be good to know. You can get a refresher with this Docker Compose tutorial

Given that, you can jump to the next section to configure the AI settings for Desktop.

Settings for the API on Docker Desktop #

Any open model you pull and run can expose the APIs for chat completion and other functionalities. For this API to be accessible from your local machine or other containers (inside or outside a docker compose set up), you will need to Enable host-side TCP support when Docker Model Runner (DMR) is enabled.

To do this, you can follow the steps below:

Open Docker Desktop
Click on the gear icon (⚙️) on the top right to show the settings of Docker Desktop
Then, click AI on the left sidebar
After that, make sure Enable Docker Model Runner is checked
Also confirm that the Enable host-side TCP support checkbox is also checked
Then click Apply as seen below:

By default, it will use port number 12343, adjust CORS settings if you need to. In the next section, you will learn about the demo Node.js application you will use to chat with the Smollm2 model.

The demo Node application #

For this tutorial, rather than writing a completely new application, you will reuse the Node.js version of the hello-genai open-source code from Docker.

You can use any open model like Gemma 3 or Mistral 3 or even Gemma function if you want to build AI Agents. For this guide, you will use Smollm2's default variant with 360 M parameters as it general purpose and small enough to run on most machines. Next, you will see a snippet of the Node.js Express ap,p which calls the Smollm2 model.

Code for the Node app #

The demo application connects a Node.js Express app to Smollm2, defined as a model in the Docker Compose file. The app is reused from Docker's open-source hello-genai repository. I have taken the node-genai app and modified it a bit. It is a simple chat interface used to send prompts to the model and display the response on a webpage. There is a screenshot if this in action in the later section.

Main part of the code for the Express.js app in the node-genai app is as follows:

app.post('/api/chat', async (req, res) => {
    const { message } = req.body;
    
    // Special command for model info
    if (message === '!modelinfo') {
        return res.json({ model: getModelName() });
    }
    
    try {
        const response = await callLLMAPI(message);
        return res.json({ response });
    } catch (error) {
        console.error('Error calling LLM API:', error.message);
        return res.status(500).json({ error: 'Failed to get response from LLM' });
    }
});

// Call the LLM API
async function callLLMAPI(userMessage) {
    const chatRequest = {
        model: getModelName(),
        messages: [
            {
                role: "system",
                content: "You are a helpful assistant."
            },
            {
                role: "user",
                content: userMessage
            }
        ]
    };
    
    try {
        const response = await axios.post(
            getLLMEndpoint(),
            chatRequest,
            {
                headers: { 'Content-Type': 'application/json' },
                timeout: 30000 // 30 seconds
            }
        );
        
        if (response.data && response.data.choices && response.data.choices.length > 0) {
            return response.data.choices[0].message.content.trim();
        }
        
        throw new Error('No response choices returned from API');
    } catch (error) {
        if (error.response) {
            throw new Error(`API returned status code ${error.response.status}: ${JSON.stringify(error.response.data)}`);
        }
        throw error;
    }
}

The above code defines an Express.js API endpoint that sends user messages to a Large Language Model (LLM), which will be Smollm2 in the example, and returns the model’s reply.

The POST /api/chat route reads a message from the request body. If the message is the special command !modelinfo, it immediately returns the current model name via getModelName(). Otherwise, it calls callLLMAPI(message) to get a response from the LLM. If anything fails, it logs the error and returns an HTTP 500 with a friendly error message.

The callLLMAPI function builds a request payload in a chat-style format: a system message that sets behavior (“You are a helpful assistant.”) and the user’s message. It sends this payload to the LLM endpoint (from getLLMEndpoint()) using Axios with a JSON header and a 30-second timeout.

If the API returns a valid response with choices, it extracts and trims the assistant’s reply. If no choices are returned or the API returns an error status, it throws a descriptive error so the caller can handle it properly. You can check the whole app.js and the view file too. In the next section, you will see the Dockerfile for this chat app.

Dockerfile for Node app #

The Dockefile below is used to run the Node.js chat app built with Express.js.

FROM node:24-alpine

WORKDIR /app

# Copy package files
COPY package*.json ./

# Install dependencies
RUN npm install

# Copy application code
COPY . .

# Create directories if they don't exist
RUN mkdir -p views

# Expose port 8080
EXPOSE 8080

# Run the application
CMD ["node", "app.js"]

The above Dockerfile builds a lightweight container for a Node.js application. It would have been better to use a multistage Docker build, still, it is fine for a demo app.

It starts from the node:24-alpine base image, which provides Node.js 24 on a small Alpine Linux distribution. WORKDIR /app sets /app as the working directory inside the container.

COPY package*.json ./ copies package.json and package-lock.json (if present) first, allowing Docker to cache dependency installation. RUN npm install installs all Node.js npm dependencies.

COPY . . then copies the rest of the application source code into the container. RUN mkdir -p views ensures a views directory exists, preventing runtime errors if the app expects it.

EXPOSE 8080 documents that the app listens on port 8080. Finally, CMD ["node", "app.js"] defines the default command to start the application when the container runs.

It is a good enough Dockefile for a small chat app like this. The next section details the Docker Compose file that links this app and container to the Smollm2 model.

Docker compose file with Node app and Smollm2 model #

You can link up the Node.js Chat app with the Smollm2 model easily using a Docker Compose file that looks like:

services:
  node-genai:
    build:
      context: .
      dockerfile: Dockerfile
    ports:
      - "8082:8080"
    environment:
      - PORT=8080
    models:
      - smollm2
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8080/health"]
      interval: 30s
      timeout: 10s
      retries: 3
      start_period: 10s

models:
  smollm2:
    model: ai/smollm2
    context_size: 8048

The above Docker Compose file defines a single service, node-genai, and its related AI model configuration.

The node-genai service is built from the local directory using the specified Dockerfile as seen in the above section. It maps port 8082 on the host to port 8080 inside the container, allowing external access to the Node.js app. The PORT=8080 environment variable configures the app’s listening port. The models section links this service to a model named smollm2.

A health check is a mechanism for verifying that the service is running correctly. It uses curl to call http://localhost:8080/health every 30 seconds, with a 10-second timeout, three retries, and a 10-second startup grace period.

The models section declares the Smollm2 model, referencing the image ai/smollm2 from DockerHub and setting a context size of 8048 tokens, which controls how much text the model considers at once.

In the next section, you will run the chat app and the model together with Docker Compose.

Running the app with Docker Compose #

To run the Node.js Chat app built with Express.js and Smollm2 attached to it, you can run the following command, which will first build the app and then run the app and the model in the background:

docker compose build && docker compose up -d

It will give an output like the following:

You don’t need to build the container every time; you can do docker compose up -d the next time. After that, to confirm that the containers are runnin,g you can execute:

docker compose logs -f

Which will show something like:

As the app is running on localhost port 8082, you can open your browser of choice and hit http://localhost:8082, which will render something similar to:

Docker includes an internal URL for the Model’s APIs at http://model-runner.docker.internal/, which can be called by other containers. If you want to call it from localhost, it is running on port 12434. Then you can chat with Smolllm2 via the simple Node.js chat interface.

To check the running containers, you can run docker compose ps. To stop the containers, run docker compose stop. To check the running models, you can run docker model ps. To unload Smollm2, you can run docker model unload smollm2; if you run docker model ps, it will not be running Smollm2 anymore.

In my experience, when running AI models locally, they can start to eat up CPU resources. So it is best to unload the model when you are done using it.

Conclusion #

In this post, you built on part 1, the introduction to Docker Model Runner, by connecting a simple Node.js (Express.js) chat app to the Smollm2 model using Docker Compose. You saw the code for the chat app, then a simple Dockerfile to run a Node.js app, and finally the Docker Compose file that connects the app to the open model (SmolM2 in this case). Then you ran it and saw it in action. Keep Learning!

Docker Model Runner: A beginner’s guide to running open models on your own machine [Part 1]

2026-01-23T11:47:32Z

Docker has been the de facto containerization ecosystem for more than a decade now. It recently added a model runner that runs many open models locally via a docker model command. In this post, you will learn how to use the Docker model runner to run smoll2 locally and interact with it. Let’s get started!

Table of contents #

Prerequisites #

Below are some prerequisites you will need installed on the machine you want to run Docker Model Runner (DMR)

Docker Model Runner (DMR) is available from Docker Desktop version 4.40 or later, so you will need Docker Desktop version 4.40 or later installed
Your hardware will also need to support the functionality, which has GPU backends. I am running the commands shown in this tutorial on a Mac with an Apple Silicon chip
If you are running Docker Engine on a Linux distro without Docker Desktop, you will need to install the docker-model-plugin separately with a command like sudo apt-get install docker-model-plugin

The easiest way to check if Docker Model Runner (DMR) is available on your machine with Docker Desktop or Docker engine installed is by running:

docker model version

If you see a version for your Docker Model Runner with the Docker Engine Kind, DMR is available on your machine. In my case, I am running Docker Model Runner version 1.0.6 on Docker Desktop 4.57.0.

Given the prerequisites have been mentioned, in the next section, you will run Smollm2 with Docker Model Runner(DMR).

Running Smollm2 with Docker Moder Runner #

Given that you have tested that you have Docker and Docker Model Runner installed on your system of choice. You will be running the Smollm2 model’s default/latest variant, which is the 360 million parameter one. It is 256.38 MB in size. You might ask why Smollm2. In my opinion, it is small enough to download quickly and does a good job of answering basic questions.

If you are not very confident with Docker commands, you can read the Docker for beginners tutorial for a refresher on Docker. You can also read the post on Docker commands like docker pull, docker images, docker run, and others.

Pull a model with Docker model runner #

You can run the following command to pull Smoll2 from DockerHub:

docker model pull ai/smollm2:latest

The output will look as follows after Smollm2 open model (by Huggingface) is downloaded to your machine:

You can also pull the model from Hugging Face.

You can use the Docker Desktop interface to pull the same model after searching in the DockerHub tab, as seen below:

But a single command is much easier than following 4 steps on the GUI. Next, to see if the model is pulled (downloaded) correctly, you can run the following command to list all models:

docker model list

It will show the following output.

After that, you can run the Smollm2 35-million-parameter model as discussed next.

Run a model #

To run the pulled Smollm2 model, you will need to run the following command:

docker model run ai/smollm2 "Why is the sky blue? Answer in a single sentence."

It will result in something like:

The sky is blue because it scatters sunlight in all directions and our eyes are more sensitive to shorter wavelengths of light, like blue and violet.

To run the model in an interactive question-answer mode, you can execute the following command:

docker model run ai/smollm2

After that, you can chat with the model as follows:

To exit the chat, you can type /bye on the command prompt, and it will take you back to your shell/CLI. If you type /?, it will give you more help options as seen below:

You can look at all the prompts given to the model on Docker Desktop by clicking the model in the Models screen, which is Smoll2 in this case:

Then click the Requests tab:

The logs don’t stay for long, though. You can see that the model is responding very fast -- under 6 ms.

You can also chat with the model from the Chat tab, as seen below:

You can also inspect the mode’s architecture, parameters, and other information in the Inspect tab:

The above information is similar to running the docker model inspect smollm2 command. You can find the list of commands supported by docker model in the official Docker documentation. For instance, you can see the running models with docker model ps and try out other commands similar to the main Docker CLI.

Smollm2 is an example; at the time of writing, there are 57 models available on Docker Hub. You can pull in Llama, Gemma, Qwen, Kimi, or any other open model of your choice and run it on your machine.

The best part is that it is local, fast, and you don’t even need internet to run a model once it is downloaded and running on your local machine.

Remove a model #

If you want to remove the Smollm2 model, you can run docker model rm smollm2, which will delete the model given an output like:

Untagged: index.docker.io/ai/smollm2:latest 
Deleted: sha256:354bf30d0aa3af413d2aa5ae4f23c66d78980072d1e07a5b0d776e9606a2f0b9

There you go, you pulled a model with Docker Model runner and were able to run it. You had a quick chat with Smollm2. In the next part, you will learn how to connect a model with your own app using Docker Model Runner and Docker Compose.

Conclusion #

In this quick and useful tutorial, you learned how to pull an open model like Smollm2 from DockerHub and run it on your local machine. This is just scratching the surface, with Docker Model runner you can run many open models on your machine from Gemma to Llama, and from Qwen to Deepseek deepening on your hardware. Keep learning!

Recap 2025: Blogging, public speaking, tech community work, and other things

2025-12-25T10:47:52Z

This will be the seventh year I have written a recap or “wrap” of the year that has passed. I started writing these in 2019, and I have continued writing them since then. In this one, I will reflect on some of the crucial professional accomplishments I achieved in 2025. Fasten your seatbelts!

Table of contents #

Highlights #

Below are the highlights of 2025 in a nutshell:

I did 10 in-person public talks/workshops this year. I did a couple more at work. In addition to Sydney, I gave talks at Canberra, Perth, and Shanghai, China too.
I wrote 16+1 (this post) blog posts this year, much less than previous years, which I had stated in last year’s recap, too. And I still beat my goal.
I was a guest on one podcast this year, Everyday Karma, where Saroj asked me many meaningful questions. The Podcast is in Nepali.
I helped organize nine meetups and one conference for GDG Sydney, and I have transferred the GDG Sydney organizing part to the next generation.
I have listened to 9 days, 19 hours of podcasts this year, from the Pragmatic Engineer to Practical AI and from Tech Lead Journal to Kubernetes Podcast.

Below are some of the details of the review of this year.

Public speaking in 2025 #

In 2025, I did 10 public talks (or workshops), most of them in Sydney. In November, I gave two talks at two conferences and participated in one panel discussion. In August, I gave a talk to mostly Google Developer Experts (GDEs) at Shanghai, China, for the GDE Summit APAC.

This year, I did the most talks in a year, which was 10. The last record was in 2019, with eight talks, all in Sydney. You can view all of my talks since 2016 listed on this GitHub repository. I wrote talks/workshops on six new topics this year and gave five talks by June. Some of the topics were Feature flags, Image to geolocation using Gemini, and Building a resume reviewer app with Gemini. One workshop has already been planned for April 2026 in Darwin.

Blog posts in 2025 #

This year, I wrote 16 blog posts (the target was 14), and with this recap, it will be 17. One thing I did differently this year was to refresh a blog post about free Node.js hosting from 2021, with help from Ashis, Yash, and Bibek. Thanks for updating the post. And after the refresh, it has climbed back up in the Google Search Results.

Unsurprisingly, 13 out of the 16 posts have been about AI, and that is what sells in the current times. Speaking of which, the most popular blog posts from 2025 are as follows.

Most viewed blog posts of 2025 #

The five most popular blog posts written in 2025 are:

The top three are from the Ollama Series I wrote in Feb 2025. There were almost 500K users on my blog in 2025, browsing through nearly a million pages. I will keep the same goal of 14+1 (recap) posts for 2026.

My blog landed on one of the top 290K websites in the world for November 2025, as per Similaweb:

And between the top 220K to 240K websites in the world as per the Tranco Rankings. Next, you can read about a podcast on which I appeared as a guest.

Being a guest on a podcast #

This year, I was a guest on only one podcast: the Everyday Karma podcast, hosted by Saroj Dahal. Below is the audio of the podcast, and it is in the Nepali language:

The podcast was released towards the end of January 2025.

Community activities #

As the organizer of GDG Sydney, I helped organize 15+ evening meetups and the full-day, three-track conference: DevFest Sydney 2025 in mid-October this year. I attended the Google For Developers Summit, having both Google Developer Group(s) (GDG) organizers and Google Developer Expert(s) (GDEs) from Australia and New Zealand in mid-March. It was amazing, the best part was meeting all the participants in person (in Sydney). I also attended the Google Cloud Summit 2025 at the end of July. A quick summary of the events is below:

I could not attend the AWS Summit, but I did attend some other meetups in Sydney. As usual, I helped around eight people with their first (or second) talk to kickstart their public speaking journey.

Listening to podcasts #

In 2025, I listened to 9 days and 21 hours of audio content. That is a total of 237 hours in a year, averaging at around 39 minutes a day. Below are the podcasts I listened to this year:

Mostly, the podcasts were technical, but I also listened to some non-technical ones like Investopoly. You can also see the full breakdown of each podcast I have heard in 2025.

Misc #

Some of the other things that came to fruition this year:

I reached more than 10K followers on LinkedIn in March this year.
Made 975+ public contributions on GitHub (33% less than last year) and also completed Hacktoberfest
The side project this year was to help some people get into the habit of listening to podcasts. We gamified it in the Xplorers group, and we got it done in style.
Similar to previous years, I did many 1:1 chats with people about career, finding a tech job in Australia, and similar topics. I hope it has helped many individuals.
I have written three blog posts this year around how AI can help you with job search. It would be good for you to read them.

Looking back, I achieved some of the goals I set in 2025, and I have set some goals for 2026. Hopefully, I will achieve them too.

A couple of new things will be coming in January or February of 2026. Stay tuned to my LinkedIn, the only social media I use.

Conclusion #

Looking back, 2025 was a good year. I attended many tech events and was fortunate to be a guest on a podcast. I also wrote the 15+ blog posts (a lot less than previous years with 25 blog posts), which still exceeded my target.

I look forward to 2026, but I will scale back on things I have been doing, like blogging and community activities, to focus on something else. Merry Christmas and Happy New Year 2026.

How to use Gemini Live audio as an interviewer for a software engineer’s job (with video)

2025-12-16T10:48:25Z

Wouldn’t it be great if you had an on-demand experienced software engineer interviewer who could take a technical interview whenever and wherever you wanted? Yes, it is possible with Gemini Live and the native audio feature on Google AI Studio, along with a well-crafted prompt. The best part is that it's free. Let’s get started!

Table of contents #

Gemini Multimodal capabilities #

The Google Gemini LLM has multimodal capabilities. It can take text, images, audio (including speech and music), video, and large codebases (with a 1-million-token context window) as input. With that input, it can output text, image, audio, and code. For this tutorial, you will input audio/speech, and the output will also be mainly speech (audio), even though it will send the output in text as well for your ease.

To use Gemini with the live native audio feature as an interviewer for a software engineering job (backend engineer), you will use the Google AI Studio live feature as discussed next.

Gemini Live Native Audio on Google AI Studio #

To use the Gemini Live Native Audio feature to behave like an experienced software engineer who will take an interview for a backend engineer role, you will need to go to the live feature in Google AI Studio. It is one of the easiest and most accessible ways to use this feature. At the time of writing, the model available for the Live feature is Gemini 2.5 Flash Native Audio Preview 12-2025.

Steps to use Gemini Live audio as an interviewer #

First, go to Google AI Studio and click the Playground link available in the left sidebar as seen below:

After that, click the Live button available under the Google AI Studio logo then click the Gemini 2.5 Flash Native Audio Preview 12-2025 option available:

Then, on the right sidebar, change the voice from Zephyr to Puck, it is a better voice (in my opinion) as you will chat with that voice for minutes:

Subsequently, scroll down to turn on Thinking mode, then Affective dialog, and lastly Grounding with Google Search as follows:

When you hover over each, it will tell you what the setting does, for instance, Affective dialog on will: Let Gemini adapt its response style to the input expression and tone.

After that, copy the prompt below and paste it into the Start typing a prompt text box:

You are a seasoned backend software engineer with over 20 years of experience and 
you have taken more than 100 backend-focused technical interviews.
For this task, you will act as an experienced interviewer, too.

You operate in two modes, the interviewer mode and the reviewer mode.
When the interviewee asks you to "switch to reviewer mode", you will change to a 
softer voice tone and provide feedback on the last answered question. 
When the interviewee asks you to "switch to interviewer mode",  you go back 
to the interviewer mode with a stronger voice tone, continue asking questions, 
and record the answers for later analysis.

Before the interview starts, you will ask for a job posting. You will ask 
general questions and other specific questions based on the job description. 

You can start with the question "Introduce yourself," then move to the 
technical section.

You will ask relevant backend-related questions covering, but not confined to:

* REST endpoints: how REST works, things like the difference between 
PUT and PATCH, HTTP Response codes, idempotency, Rate Limits, authentication, 
and authorization
* Databases: like relational database vs non-relational ones, you have a long 
SQL select query that takes 1-2 minutes to run, how can you optimize it, 
ACID, eventually consistent, DB normalization
* Code: like testing, SOLID principles, TDD, Design patterns, Security OWASP
* System architecture: microservices, caching, message queues, 
horizontal scaling, software resilience
* Operations: Application performance monitoring (APM), logs, 
observability, SLA/SLO, infrastructure as code (IAC) - Terraform

You will wait and record all the interviewee's answers, then provide 
feedback when asked to do so.

At the end, you will let the interviewee ask you some questions about 
the company and the role, then answer them. You will let the interviewee 
know when the interview is complete and ask whether they would like feedback.

Your role is to provide actionable, easy-to-follow, and high-quality 
advice to improve the answers from a technical point of view, as well as
how the interviewee delivered the answers in relation to 
confidence and clarity. 

Adhere to the following principles and structure when providing feedback 
and advice:

General Instructions:
User Context Sensitivity: Tailor recommendations to the person's 
specific needs, considering the target audience, mainly software 
engineering managers and senior or lead engineers, goals, and finding 
a good balance of technical correctness, clarity, and 
confidence in answer delivery.

Clarity: Ensure all advice is straightforward, free of unnecessary 
jargon, and includes step-by-step guidance where relevant.

Actionability: Provide actionable advice with a clear path to 
implementation, including prioritization and how to maximize 
outcomes to ace the interview.

The prompt is self-explanatory, and when pasted on the text area, it will look as follows:

After that, click Run.

Then it will ask for a job description. It is best to give a backend engineer role you can try with this opening at Atlassian: https://www.atlassian.com/company/careers/details/23645 – here is an internet web archive link for future reference, and again click Run:

After that, it will ask for your introduction. You can give your “backend engineer” introduction by clicking the Record mic button and speaking:

Then it will move on to other technical questions, and you can follow along and provide the answers.

Switching modes #

At any point in the interview, you can say switch to reviewer mode then ask for feedback about your answers.

Then again, tell it to switch to interviewer mode, and it will continue the questions. You can also ask for feedback at the end, but for that, you will need to increase the session context size setting in the right sidebar. You can see this in action in the video below.

Demo video #

Below is a 7 minute demo video of the whole process for your reference:

Other inteview ideas #

You can also use Gemini Live with the screen share feature to do a system design interview. You will need to tweak the prompt to be a system design interviewer and do a system design interview
for a URL shortner. Be ready to draw some boxes and arrows :).

Conclusion #

In this post you saw the powerful combination of Gemini Live, native audio, and a detailed prompt offers a groundbreaking way to practice technical interviews. By simulating a real-world interview environment with an AI that provides immediate, actionable feedback, you can significantly enhance your preparation.

This approach not only sharpens your technical knowledge but also builds confidence and clarity in your delivery and the real interview scenario. Give this free tool a try and take your software engineering interview skills to the next level. Keep learning and experimenting!

How to create a hair style changer app using Gemini 3 on Google AI Studio

2025-11-27T10:48:25Z

You can make practical and fun apps using Gemini 3 on Google AI Studio for free. In this post, you will create a fun app fast. The app lets you change a person's hairstyle on any human picture you upload. It will work for both male and female photos. Let’s get started!

Table of contents #

Prerequisites #

Before you generate the hair style changer app on Google AI Studio using the latest Gemini model, Gemini 3, you will need the following:

A Google Account to access Google AI Studio
Understanding of how prompting for LLMs like Gemini works would be helpful. Reading this guide on prompt engineering would help you get some critical pointers
Knowing how frontend apps are built with React.js would be beneficial, but not needed

Given that, we can get our hands dirty with some code generated by Gemini 3 on Google AI Studio's Build feature next.

Steps to generate a hair style changer app #

To build a virtual try new hair style app using Gemin 3 on Google AI Studio, you will first need to go to the Google AI Studio app:

Navigate to Google AI Studio #

You can go to the Google AI Studio app by visiting https://aistudio.google.com/ on your favourite browser. I am using Chrome and signed into my Google account. You should see something like the below:

Go to Build #

Then click on Build to build/generate your AI virtual hair style try on app. And you will land on the build page – Start section as follows:

To build your app, you will paste the prompt as shown in the next section.

Paste the prompt on AI Studio #

After that, you can paste the following prompt in the Describe your idea text box:

Prompt:

Create an app that lets users either upload their own photo or take a 
picture with the camera, then select a hairstyle. After that, a new 
image of the user will be generated with the chosen hairstyle. 
This app should work for both male and female users.

Please ensure there are at least 40 hairstyles, categorized into 
male, female, and creative styles. Figure out the photo is of a 
male or female and select the right category automatically.
Also, please provide the output that includes both the uploaded 
photo and the generated photo with the chosen hairstyle. The user 
should be able to upload an image up to 5 MB. You can show some 
hair grooming and styling tips while the chosen hairstyle 
photo is being generated.

Which will look like the below:

Start" alt="Hair style changer app prompt on Google AI Studio Build -> Start" />

After that, click the Build button and it will start building the app.

At the time of writing (Nov 2025) it selects the Gemini 3 Pro preview model by default which is the latest version of the Gemini model.

Wait for the generation to finish #

The generation of the Hair Style change app will take some time to complete. You can have a look at the Code section to see the code Gemini is generating (or vibing):

In my case, it took 123 seconds to generate the whole app and then I clicked the Preview button to see how the app works.

Use the hair style changer app in preview mode #

I had to click Allow for the app to use the camera but I uploaded AI generated photos to test the app which looked like:

I made the app go FullScreen – the option is besides the Code button`. After uploading the photo of a male (AI Generated photo with Google ImageFX for free), this is what the app showed me:

Notice, that it auto selected Male as instructed in the prompt. You can select any style, I selected Man Bun and then it showed me the loading screen for some seconds:

Adhering to the prompt it showed the hair care tips.

Gemini 3 is much better at doing what it is instructed to do than older version. It has a very good adherence to the provided prompt, as seen above.

Once the image generation was done this is what I saw on the screen:

You can see the original image with a spiked hair style and the generated photo with the selected Man bun hair style. Wasn’t it fun to build and this test app :) . Next, you will try a female photo and see how the app works.

Try a female hairstyle upgrade #

To try out a female subject, I have generaetd a photo using Google Image FX, To try a new photo, I click the logo of the app on the top left corner to reset the app. Then I uploaded this image and it took me to the Choose a Style screen:

Again, adhering to the prompt it auto selected Female styles and you can select a style like Pixie Cut, then it will show the Loading.. screen with the hair care tips again and when it is done it will show something like the below:

You can also try some creative hairstyles with funky colors. I tried Rainbow Hair and it was funny.

There you have it your own hair style changer app generated by AI and powered by AI too. You can deploy it on Google Cloud Run and share it with your friends too. You can read on how to do that in my previous blog post.

Add another feature #

You can also add other feature like a side by side slider. Google AI Studio also recomemnds some other feature you can add to your app:

You can click the Suggestions and Gemini 3.0 will build it for you on Google AI Studio. Try it!

Thanks to Google: Google Cloud credits are provided for this project. It is part of the #AISprintH2 sprint.

Conclusion #

Using Gemini 3 on Google AI Studio's Build feature, you rapidly created a functional hair style changer app with a simple prompt in less than 3 minutes. The model successfully handled style categorization, uploads, and accurate image generation for both genders.

This showcases how multimodal LLMs like Gemini 3 accelerate app development, making complex image tasks straightforward. From prompt to tested application, a sophisticated AI tool was quickly realized.

Your deployment-ready app can now be enhanced with features like a side-by-side slider or new styles. Google AI Studio and Gemini 3 usher in the era of prompt-driven development, offering boundless creative potential. Keep Learning and reaching new goals with Google’s AI products!

How to use NotebookLM: A practical guide with examples

2025-11-23T10:38:25Z

Google’s NotebookLM is a hidden gem in the world of information processing. It is not just another chatbot; it is a personalized AI research assistant grounded in your own documents. If you have ever felt overwhelmed by the sheer amount of documentation, PDFs, or websites you need to read to get a job done, this tool is for you at no cost. In this post, you will learn how to use NotebookLM, explore every important NotebookLM feature, and walk through a real-life example of using it to land a Senior Software Engineer job. Let’s get started!

Table of Contents #

Need for NotebookLM #

We live in an era of information overload. As technologists or even just avid learners, we are constantly bombarded with documentation, white papers, long YouTube tutorials, and complex job descriptions. The problem isn't finding information; it's synthesizing it into something useful.

You might be used to pasting text into ChatGPT or Claude to get a summary. That works for small things. But what if you have a 50-page PDF, a link to a website, and a YouTube video, and you need answers based only on those sources without the AI hallucinating facts?

This is where NotebookLM shines. It uses a concept called Retrieval-Augmented Generation (RAG) to "ground" the AI in your specific sources. It’s like hiring a super-smart intern, handing them a stack of books, and saying, "Only answer questions based on these books."

In this guide, you will go deep into how to use NotebookLM to boost your productivity. We will look at its features and then apply them to an efficient scenario: preparing for a high-stakes job interview.

What is NotebookLM? #

NotebookLM is an AI-first notebook offered by Google. Unlike standard LLMs (Large Language Models) that rely on their vast, pre-trained knowledge of the internet (which can be outdated or inaccurate), NotebookLM relies on "Sources" that you upload or links you provide.

When you create a "Notebook", you upload documents (PDFs, text files, Google Docs, Slides), paste website URLs, or link YouTube videos. The AI then becomes an expert on those specific sources – not on the parts of the internet with a cut-off date. NotebookLM is built with the latest Gemini models.

The main benefit here is trust. When NotebookLM gives you an answer, it includes citations. You can click a citation to jump directly to the paragraph in your uploaded PDF where the information is located. For software engineers reading technical specs or API documentation, this accuracy is non-negotiable.

Prerequisites #

Before we dive into the features and examples, below are some prerequisites you should have:

A Google Account: NotebookLM is a Google product, so you will need a standard Gmail account or a Workspace account to access it.
Access to NotebookLM: You can access it at notebooklm.google.com. It is currently has a free version (at the time of writing).
Source Material: To get the most out of this guide, have a PDF, a website URL, or a YouTube video link ready to experiment with.
Basic understanding of GenAI: Knowing how to write a basic prompt (asking the AI to do something) will be helpful.

Given that is mentioned, you can proceed to explore the features.

NotebookLM features #

NotebookLM is packed with features to help you quickly understand complex information. It is not just about text summaries; it is about multimodal understanding.

Let’s analyze every important NotebookLM feature you should be using. All the features mentioned below take a couple of minutes or more to be generated. It shows a loading icon on NotebookLM when the artifact is being generated.

Audio Overview - podcasts #

This is arguably the most viral and impressive NotebookLM feature. The Audio Overview allows you to turn your static documents, text files, and slides into an engaging, two-way audio conversation. These are also called NotebookLM podcasts.

Imagine you have uploaded a dry, 30-page technical document on Kubernetes architecture. Reading it might put you to sleep. However, with one click, NotebookLM generates a "Deep Dive" audio discussion.

Two AI hosts (a male and a female voice) will discuss your content. They don't just read it out loud like a text-to-speech engine. They banter. They use analogies. They express surprise at interesting facts. They summarize the key points in a format that sounds like a “high-quality” tech podcast. Take the “high-quality” with a grain (spoon) of salt :).

Someone apparently listened to 200 of these (in November 2024 - a year back) and wrote a blog post about it with a YouTube video. She concluded they lack the genuine connection and authenticity required to compete with human conversations. I would leave the analysis up to you. Here is an audio overview for a blog post about Good software engineering is about finding a solution at the correct layer with boring technology.

To use this, simply load your sources, go to the "Studio" section, and click "Audio Overview". it takes a few minutes, but the result is pretty realistic as seen below:

Next, you will learn about video overview.

Video Overview #

Notebook LM has recently added the video overview feature. Similar to an audio overview, it will create a video from the documents or links you provided in the source. Since last month, it has been using Nano banana to choose styles and create a brief video overview.

To create a default Video Overview after you add your source(s), you can click the Video Overview option in the Studio panel on the right side of the screen. Below is a 7-minute video I created based on my blog post Unblocking Software Engineers: Overcoming Non-technical and Technical Roadblocks:

The video is usually around 8 minutes, and the brief is around 2.5 minutes. It is an excellent feature if you want to digest information faster and you learn things better in a visual format. Suppose you click the “pen” beside the Video Overview. In that case, you can generate an “Explainer” or a “Brief,” choose the Language (50+ available, including Nepali), and select the visual style for the video. Give it a go.

Mind Map #

Visualizing data is crucial, while NotebookLM doesn't have a "Draw Mind Map" feature. You can use it to generate a mind map. To create a mind map from the uploaded sources, you can click the “Mind Map” on the Studio panel on the right side of the screen. It will generate a mindmap like the one below:

This converts your static text or PDFs into visual architecture diagram code, a massive time-saver. It really helps you understand complex concepts in a hierarchical mind map, as shown above.

Reports #

Reports are distilled information, taking a particular angle or type of writing generated by NotebookLM from the given set of sources. It can be in various formats like a Brefing Doc (with key insights and quotes from the source, Study Guide (short-answer quiz and glossary of key terms), Blog Post (insightful takeaways distilled into a highly readable format). Depending on the source, NotebookLM now has suggested formats for reports like Historical timeline, Concept Explainer, and others.

You can generate a “report” by clicking the Reports option in the Studio right side panel. It will show an overlay like the one below:

From there, you can click the given format or click the pen beside the format and edit the format to be generated. If you click the pen / edit button, you can choose the language and type in what you want the report to be about.

Flashcards #

Flashcards are like short questions and answers that are generated from the sources you upload to NotebookLM. You can generate them by clicking the FlashCards option on the Studio panel on the right side of the screen. You can customize the Number of cards and Level of Difficulty of questions, and click the pen icon to change the generation. Below is an example of the generated flash card:

Flashcards can help you quickly learn the main points from the additional sources.

Quiz #

Similar to Flashcards, Quiz is another information representation format in NotebookLM. This feature will be handy as it will generate a quiz with 10 questions (usually) and answers from the given sources. You can generate a Quiz by clicking the Quiz option in the Studio panel on the right side of the NotebookLM user interface. Similar to the other features already mentioned, you can edit the type of the quiz by clicking the pen button. Below is a quiz generated for my blog post How Software Deployment tools have changed in the past 20 years using the default settings.

You can play the quiz too and refresh your knowledge on the software deployment tools of the past 20 years.

Infographic #

From the given sources of files, links, Google Docs, and YouTube videos, you can also generate an infographic on NoteBookLM. To create an infographic, you should click the Infographic option in the Studio section found on the right side of the page. Below is an infographic generated by Notebook LM for the 20 years of software deployment tools blog post:

It can distill the information well. This feature was recently added to the free version of NotebookLM.

Slide Deck #

Similar to an infographic, you can also generate an “image” slide deck from the uploaded sources. You can also download the slide deck as a PDF and upload it to a slide rendering service like SlideShare or SpeakerDeck. In my experience, it usually generates 12-15 slides with images and explanations from the sources provided. You can develop the slides by clicking the Slide Deck option in the Studio panel on the right side of the NotebookLM user interface. The generated slides look as follows on the NotebookLM interface:

Below is an example of a generated slide deck for the 20 years of software deployment blog post:

In the next section, we will put all of this into practice with a concrete, real-life example.

How to use NotebookLM - Real-life example of job search #

Let’s say you are a backend software engineer looking for a new role. You have found a job opening at Relevance AI, a real company doing incredible things with AI agents, for a Senior Software Engineer role.

The job market is tough. To stand out, you cannot just send a generic resume. You need to understand the company deeply, understand the role, and prep for the specific interview questions they might ask.

Here is how you use NotebookLM to crush this process.

Add sources to NotebookLM #

To start a new Notebook, first go to NotebookLM and click the Create new notebook:

After creating an empty notebook, you can add sources. You will add the following sources:

the job vacancy
some pages from the Relevance AI website
Relevance AI reviews from Glassdoor

To add the job vacancy page, you can save the job as a PDF file by going to https://jobs.ashbyhq.com/relevanceai/d5d5fc13-7415-454b-92d7-487cf187ba0a on Chrome and hitting Ctrl+P to print it and save the file as PDF. You can also access an archived version of the job vacancy. Sometimes NotebookLM doesn't extract the contents of a URL, especially if the page is loaded with JavaScript, so saving the page as a PDF file is better.

After you have the file, click on the choose file option below the Upload sources section in the newly created NotebookLM notebook as follows:

After the file is uploaded and processed, you will see a screen like the one below:

It has the vacancy as a PDF uploaded as a source, and in the Chat section in the middle, it also includes a summary of the Senior Software Engineer vacancy. Before you use any of the things in the right panel – Studio, upload more sources.

Next, you can go to the company’s website, which is relevanceai.com for this case, and run the following script to get all the links on that page:

urls = $$('a');
links = [];
for (url in urls) {
  const href = urls[url].href;
  if (href.startsWith('https://relevanceai.com') && !href.includes('#') && !links.includes(href)) {
    links.push(href);
  }
}
console.log(links);
console.log(links.join(" "));

Open the console on Google Chrome and run the above JavaScript snippet as follows:

It will log all the links on that page (61 in my case), since NotebookLM (free version) only has a maximum of 50 sources, so I had to select the first 40. You can choose the ones that make more sense to you. It would be good to keep the careers page, though, These were the 40 links I selected and pasted on Paste URLs after clicking the Add sources -> Website under the Link section:

Then you can click Insert to add the 40 links to the Notebook. It will take a couple of seconds to pull in the contents of the 40 web pages, then it will look as follows:

Now you can also add the Glassdoor review for Relevance AI, saved as a PDF. For that, go to the Glassdoor review page of Relevance AI, expand all the reviews by clicking the Show more link after every review, then only print the page as a PDF, and then upload the page. If there are multiple pages, you can use a free PDF joiner to join multiple PDFs into one.

To upload the PDF, click on + Add sources, then click choose file and select the PDF file you created from the Glassdoor page for Relevance AI. By now, you would have 42 sources, and if you want, you can add your resume too to ask questions to make it better, but we will not do that for this tutorial.

Ask questions about the company #

You can start asking questions, and NotebookLM will give you answers based on the 42 sources added (including the two PDF files). You can ask a question like:

What tech stack is used, and what is mentioned, in the senior software engineer job vacancy?

Which will be answered by NotebookLM as follows:

If you click on citation numbers like 2, it will take you to the source in the list of sources added. You can also hover over it to see the source’s text.

You can also ask other questions, like:

Is Relevance AI profitable?
How big is the tech team at Relevance AI?
What are the values of the company? Could you list all of them?

You can get an idea of what you can ask the sources you added to the Notebook now. You can also add more sources, such as YouTube videos, to enrich your research, but be aware that the free version of NotebookLM has more limitations than the pro version. Read the plans page to get the exact details.

Explore more of the NotebookLM features #

In addition to asking questions about the source, you should explore the NotebookLM features discussed above. Let’s start with Audio Overview. You can tell NotebookLM to generate a Brief audio overview (podcast) and focus on the senior software engineer job vacancy:

After you click Generate, it will take a couple of minutes, and you will get a 2-3 minute podcast focusing on the job vacancy. You can generate the audio overview in more than 50 languages. It will take a few minutes to be generated. When it is done, it will be listed below the eight options/features in the Studio panel on the bottom right side of the NotebookLM interface:

You can listen to it or even download it as a .mp3 file. You can use this online audio to video converter, put an image, and upload it to YouTube if you wish.

Similarly, you can generate a Quiz, play it, or even share it with your interviewer after the interview. A Mind Map will help you understand the company's domain in a hierarchical format, which is much easier to comprehend. You can also create a Video Overview to get a quick 5-8 minute video to understand what the company does and how it makes money, depending on the sources you have uploaded.

Try all eight tiles and see what they generate. I have set my Relevance AI Notebook to be publicly accessible if you want to see what it looks like.

You can watch the video overview and listen to the audio overview too. You can also add your own notes and play around with NotebookLM. If you would like to use NotebookLM as a native mobile app you can download the Android app or the iOS app.

Conclusion #

NotebookLM is more than just a summarizer; it is a tool for synthesis and deep understanding with a generous free plan. By grounding the AI in your specific documents—whether that is a job description, technical documentation, or meeting notes—you eliminate the noise of the general internet and focus purely on what matters to you.

In this guide, you learned how to use NotebookLM features like Audio Overviews to learn while commuting, video overview, Mind Map, Reports, Flashcards, Quiz, Infographic, and Slide Deck. You also looked at a practical example of using NotebookLM as a research and thinking companion in a job search context.

Don't just read about it. Go to notebooklm.google.com, upload a PDF you have been meaning to read, and hit that "Audio overview" button or Video Overview. You will be surprised at how much more you learn. Keep learning!

How to build your own resume reviewer with Google AI Studio in minutes

2025-11-14T10:48:25Z

Using Google AI Studio’s build feature, you can build any frontend application by giving a prompt. The Build feature in AI Studio generates applications that use the Gemini SDK without any server-side components. The apps run in a sandboxed frame. For this post, you will create a tech resume reviewer that will score your resume out of 100 against a given job description. Let’s get started!

Table of Contents #

Google AI Studio #

As Google defines it:

Google AI Studio is the fastest way to start building with Gemini, our next generation family of multimodal generative AI models.

In short, it is an application that turns a prompt into an application prototype in minutes. It is a playground where you can test out your ideas, build apps, and even use the latest Gemini feature, like the live feature to share your screen or camera video, and ask questions about it. You can also build multimodal apps that can take an image or a video as input and generate images as output. There are lots of possibilities.

For the scope of this blog post, you will focus on the “Build” feature of Google AI Studio and generate a text input to text output application that takes in your tech resume and a job description, analyzes both, and gives your resume a score out of 100 against the job description. It also provides specific guidance on writing your position description using the XYZ formula to achieve the best results. You will build your own tech resume reviewer next.

Steps to build a resume reviewer with Google AI Studio #

To build a tech resume reviewer using the Google AI Studio’s Build feature. You will first need to go to the Google AI Studio app:

Go to Google AI Studio #

You can go to the Google AI Studio app by visiting https://aistudio.google.com/ on your favourite browser. For this blog post, I am using Chrome with my Google account signed in. It will look like the following when you open Google AI Studio on your browser:

Next, you will navigate to the build section of Google AI Studio.

Navigate to Build #

To go to the Build feature of Google AI Studio, you can click the Build menu item on the left navigation, as seen below:

From there, you can give a prompt for the type of app you want Gemini to build, which is in the next section.

Input the intelligent resume reviewer prompt #

As you want to build an intelligent tech resume reviewer, you will use the prompt given below:

Build me an intelligent resume reviewer that analyses a resume against a job
description, providing actionable feedback and suggestions based on the 
proven XYZ formula to help you stand out. Both the resume and the job 
description are also uploaded as text.

Please take note of the following things when reviewing the resume:
1. The primary purpose of the resume is to get the initial call or 
email from the tech recruiter, who is a non-technical person 
2. Please make the resume appeal equally to the engineering manager 
and other technical leaders who will take the interviews in later rounds. 
3. Keep yourself in the position of the resume receiver and frame the 
bullet points in a way to accentuate how the candidate can add value 
to the organisation


Technically, please do the following: 
1. While uploading, show messages like "Parsing objective", 
"Analysing job descriptions", "Creating a personalised review", 
and similar messages. Please show the same message exactly once
2. Show a percentage score of what the resume currently scores 
against 100, and things that can be improved in categories 
like objective, job description, side projects, formatting, and 
use of language, etc
3. Show all the suggestions for each category nicely presented 
in a foldable bullet point list per category, highlighting the 
things to change per sentence.

As a baseline, always use Australian English spellings for 
all the suggestions.

Let’s quickly analyze the prompt:

The prompt is divided into three parts. The first part tells the model what kind of app it needs to build
The second part adds some more information about how to review the resume from both a technical and a non-technical point of view
The third part provides some technical guidelines, like scoring, providing suggestions, and using Australian English

You can edit the prompt if you like, or just paste the prompt on Google AI Studio Build feature as seen below:

After that, click the Build button on the form. Then you will need to wait for the app to be built. It will take 1-2 minutes it will show something as follows when it is creating the app:

You can follow the progress on the left-hand side, the Code assistant panel or even click the “Code” button on the right side to see the generated code as follows:

Once it is done, it will load the Preview tab and may look something like the below:

Next, let's test the app in preview mode.

Use the built app in preview mode #

To test the generated app, you will need a resume. For that, we will use a sample frontend engineer resume in text format. As the sample Frontend engineer role, we will use this mid-level Frontend Engineer Role at Lorikeet AI in text format.

When you paste both the resume and the job vacancy text in the app, it looks like the following:

After that, you can click the Review My Resume button, which will take some time to review the resume against the job description:

After a few seconds, it will give its analysis, something like the following:

Hurray! Your AI-powered tech resume reviewer is working. Now, if you want to share it with your friends, you can deploy it on Google Cloud Run. How to do it is discussed next.

Deploy the app on Google Cloud Run #

To deploy your app, you will need a functioning Google Cloud Platform account. You can create a new project on a Google Cloud Platform account:

Put in the name as resume-reviewer and then click the Create button. It will take a few seconds, and the project will be created associated with your selected billing account:

Then head back to Google AI Studio, you can click the Deploy App button on the top right, which looks like a rocket 🚀:

In the drop-down, select Import Project:

In the right sidebar, search for resume reviewer and choose the project you have just created, and click Import:

After that, select the resume-reviwer project in the drop-down of the overlay to deploy your app to Google Cloud Run in that resume-reviewer project:

After the project (and billing) is verified, you can click the Deploy app button to deploy your intelligent tech resume reviewer app to Cloud Run:

It will take some time (up to a couple of minutes) for Google AI Studio to deploy the app, and then it will give you a URL to view your app running on Google Cloud Run:

Then you can click View app to see the app running on Google Cloud Run. You can share the URL with anyone, and they will be able to view and test the tech resume reviewer you generated:

You have successfully built and deployed your own intelligent tech resume reviewer using Google AI Studio in minutes without writing any server code! You can test it out on your favourite browser:

You can close the overlay and edit the app with another prompt if you like. If you want to use the app as a chat, you can use the Resume reviewer Gemini Gem as discussed next.

Video #

I have recorded a video of generating the whole app and testing it in preview (not the deployment part). You can watch the 6-minute video below:

Resume reviewer Gemini Gem #

If you don’t want to build an app and still use the resume reviewer, I have created a Gemini Gem for Resume reviewer, just click the link and use it:

That is another way to use the prompt without creating a custom application.

Conclusion #

This post demonstrated using Google AI Studio's Build feature to quickly prototype and deploy an intelligent tech resume reviewer without complex server code. You covered exploring AI Studio and its Build feature, crafting a detailed prompt for functionality (job description analysis, 100-point scoring, XYZ feedback), generating and testing the application, and finally deploying it to Google Cloud Run for a shareable URL.

You also noted the pre-built Gemini Gem as an alternative. This process showcases the rapid, custom tool development possible with Google AI Studio and the Gemini SDK.

Keep learning and keep exploring!

How to use the remote GitHub MCP server with Copilot on VS Code: a step-by-step guide

2025-08-30T10:59:24Z

Imagine being able to open a pull request or close an issue without leaving your IDE/Editor. This is made possible by the GitHub Model Context Protocol (MCP) server. In this post, you will learn how to use the remote GitHub MCP server with GitHub Copilot on VS Code. Let's get started!

Table of contents #

What is MCP #

It is hard to believe that the Model Context Protocol (MCP) was announced less than a year ago in November 2024, and it has achieved so much popularity. But then what is MCP? MCP stands for Model Context Protocol. Think of it as a universal translator or a "USB-C port for AI applications”. Just as a USB-C cable allows you to connect various devices to your computer seamlessly, MCP provides a standardized way for Large Language Models (LLMs), such as GitHub Copilot, to communicate with external data sources and tools.

In essence, as per the specification, MCP standardizes how applications share contextual information with LLMs and how AI systems can expose and utilize external tools and capabilities. This means that instead of an AI model operating in a vacuum, it can tap into the rich, real-world context of your development environment. This protocol is open, meaning anyone can implement and use it, fostering a vibrant ecosystem of integrations.

The goal of MCP is to make AI assistants more informed and capable, turning them into true teammates that can handle complex, multi-step projects with real-time access to external resources using an MCP server, which can be local or remote. It's about making your AI tools understand your world better.

Official GitHub MCP server #

Now that you understand the power of MCP, let's talk about the official GitHub MCP server. This server is a specific implementation of the Model Context Protocol, provided and maintained by GitHub itself. Its primary purpose is to connect your AI tools directly to GitHub's platform, enabling them to interact with your repositories, issues, pull requests, and more through a structured and secure interface.

The GitHub MCP server acts as a bridge, enabling AI assistants to manage repositories, automate issues and pull requests, gain CI/CD insights with GitHub Actions, and perform other tasks. It is open source and also used for the remote GitHub MCP server.

There are two main ways you can utilize the GitHub MCP server: running it locally or connecting to a remote, hosted version. Let's explore both.

Using GitHub MCP server locally #

For those who prefer a hands-on approach or need a highly controlled environment, you can run the GitHub MCP server directly on your local machine. This typically involves using Docker, which encapsulates the server and its dependencies, ensuring a consistent environment.

You will need Docker and a personal access token from GitHub to run the image of the open-source, official GitHub MCP server. Although this is not difficult to do, some effort and maintenance are surely involved. You can follow the official docs on how to run it locally. You can see one way of running the GitHub MCP server below:

docker run -i --rm -e GITHUB_PERSONAL_ACCESS_TOKEN=<your-token> -e GITHUB_TOOLSETS="repos,issues,pull_requests,actions,code_security,experiments" ghcr.io/github/github-mcp-server

While running the GitHub MCP server locally gives you control, it often comes with the overhead of infrastructure management. This is where the remote GitHub MCP server shines.

Remote GitHub MCP server #

For most teams and individual developers, the remote GitHub MCP server offers a significantly more streamlined and hassle-free experience. This version is hosted and managed directly by GitHub, eliminating the need for you to worry about Docker containers, manual updates, or infrastructure maintenance. On top of that, as you are using the Oauth authentication and not a personal access token, it s a better alternative from a security point of view as well.

There are multiple advantages to using the remote GitHub MCP server, which GitHub manages. There is no infrastructure overhead for you; getting started is much easier (just add a URL, authenticate, and you are good to go). As you are using a remote server, it will work on any device, including a remote machine/Cloud Shell running on a VM on a remote server.

The remote GitHub MCP server is currently in Public Preview. While access may be gated depending on the authentication type, it's the recommended path for most users due to its ease of use and reduced management burden. It allows you to focus on what you do best: writing code and building amazing things, not running an extra Docker container on your machine.

Remote GitHub MCP server on VS Code with Copilot #

This is where the magic truly happens. Integrating the remote GitHub MCP server with VS Code and GitHub Copilot elevates your AI-assisted development to a whole new level. Copilot, already a powerful coding assistant, becomes an even more intelligent and context-aware partner when it can access your GitHub data through MCP.

When you use Copilot in VS Code with the GitHub MCP server configured, Copilot Chat's "Agent Mode" can perform complex tasks by invoking specialized tools exposed by the MCP server. This means you can interact with your GitHub repositories using natural language prompts, and Copilot will translate those into actions via the MCP server.

For example, instead of manually navigating GitHub to create an issue, you could simply prompt Copilot: "Create a new issue on this repository for fixing a bug with the submit button style.". The agent would then use the MCP server to interact with GitHub's APIs, creating the issue on your behalf. Similarly, you can ask it to create a pull request, list the last run GitHub action, and do many other things.

Step-by-step guide to connect VS Code to the remote GitHub MCP server #

To connect the remote GitHub MCP server with your VS Code and Copilot, you will surely need VS Code (V1.92 or later) with the GitHub Copilot extension installed and a valid GitHub account. Also the current project should have a connected GitHub repository. With that in mind, let’s start:

Install the remote MCP Server #

To install a new MCP server on VS Code:

Open the command palette (Cmd+Shift+p on a Mac) and run > MCP: Add Server…
In the next step, select HTTP (HTTP or Server Sent Events) Connect to a remote…
For the Server UR,L put https://api.githubcopilot.com/mcp/ and hit Enter to confirm
Then, name the server github-remote-mcp-server or something you feel is appropriate and hit Enter
After that, select Global or Workspace. In my case, I selected Global so that it runs on all projects
Then it will ask you to allow the extension, click on “Allow”
After that, it will ask for authentication. For that, select your GitHub account, which was geshan in my case
This will take you to the GitHub auth page to give the correct permissions.
To test if it is connected, run this curl curl -I https://api.githubcopilot.com/mcp/_ping. You will see an output like below (and your VS Code will look similar):

Great! You have successfully connected the remote GitHub MCP server with our VS Code. If you want to give only read-only access, add the following, below the URL in the mcp.json file:

"headers": {
        "X-MCP-Readonly": "true"
      }

Next, you will list pull requests for a repo.

List the pull requests for a repo #

To list pull requests, you can type #list_pull_requests in your GitHub Copilot Agent mode, it will figure out the GitHub repo and try to list the open pull requests if any. If there are none, type in list last three merged pull requests in a new GitHub Copilot chat, it should list them as below:

Other things to do with remote GitHub MCP Server #

You can ask Copilot What operations can you do on GitHub with the mpc server? It will list down the things it can do, as follows:

From here, you can chat and know or do things you want with GitHub, like listing gists, creating gists, interacting with issues, or even opening a pull request directly from VS Code.

MCP server settings #

You can also go to the settings and turn on or off any of the remote GitHub MCP server tools as per your need, as seen below:

There you have it, a way to interact and do things with GitHub without leaving your VS Code editor.

Conclusion #

In this post, you learned about what MCP is and how to use the GitHub MCP Server. You can use the GitHub MCP server locally with Docker or without installing anything with the remote option. After that, you learned how to use the remote GitHub MCP server on VS Code with GitHub Copilot. First, connect to the MCP server and authenticate, then perform a couple of tasks, such as listing the pull requests for the current repository. You also took a quick look at the tools of the GitHub MCP server and learned how to turn these tools on or off as needed.

I hope you have gained some new knowledge about MCP in general and the official GitHub remote MCP server. Carry on learning!

How to run Gemma 3 on Google Cloud Run, the easiest way with AI Studio

2025-06-08T10:59:24Z

Gemma is a collection of lightweight, modern open models built by Google. They are designed to run fast on devices like phones, on machines in the cloud, to help developers create AI applications. In this post, you will learn the easiest and fastest way to run the latest version of Gemma, 3 (4 B), on Google Cloud Run deployed from Google AI Studio. Let’s get started!

Table of Contents #

Gemma 3 on Cloud Run #

To deploy Gemma 3 (4 billion parameters model) on Google Cloud Run from Google AI Studio. Gemma 3 models’ size ranges from 815 MB for 1B parameters to 17 GB for the 27B parameters. The model you will deploy is the 4B parameters, which is 3.3 GB.

You will first create a new Google Cloud Project. You can deploy Gemma 3 from AI Studio with just a few clicks. When deployed from AI Studio, it uses Ollama under the hood to run Gemma 3 inside a container. In that container, there is a slim Go server/proxy to verify the API key.

You will also learn about a command you can run on Google Cloud Shell that will have a similar effect later in this tutorial. You will create a new Google Cloud Project in the next section to get going.

Create Google Cloud Project #

To create a new Google Cloud Project, make sure you are logged into your Google Account. Then you can go to the Create new project page on GCP and fill in the following details:

After that, click the Create blue button to create the project. It may take some time, and your project will be created, and you will be notified about that:

Note the project name you just created. Then, go to Google AI Studio and follow the steps shown in the next section.

Deploy Gemma 3 from AI Studio #

On Google AI Studio, go to the Chat section, which should open by default:

Under the Run setting on the right side bar, click on the dropdown that has the model name selected, like Gemini 2.5 Flash…, then select Gemma, and after that click Gemma 3 4B, as seen below:

On the selected Gemma 3 4B model, bring the temperature down to 0.3 and the Top P setting to 0.4 , as follows:

The above setting doesn't matter when you deploy. If you want, you can chat with Gemma 3 like ask it why is the sky blue?, give the shortest possible answer. It should give back a one sentence answer.

After that, click on the black and white Rocket icon (🚀) besides the Run settings and click Deploy to Cloud Run as seen below:

Then, you can select the project created in the previous step, which was gemma3-on-cr in my case:

After that, click the Deploy to Google Cloud blue button. It will take some time to say Deploying to Cloud Run. If all goes well after a couple of minutes, you will get a URL where Gemma 3 (4B) is running on Google Cloud Run with an API key as follows:

You can click the Get code blue button to try a curl command to verify that Gemma 3 on Cloud Run works as expected, select REST on the left select box, and copy the code:

When you run the copied code in the command line, add give your answer in 1 sentence, then only run the code. The final code I ran was:

curl "https://gemma-3-4b-it-some-long-number-region.run.app/v1beta/models/gemma-3-4b-it:streamGenerateContent?key=the-api-key" \
   -H 'Content-Type: application/json' \
   -X POST \
   -d '{
     "contents": [{
       "parts":[{"text": "How does AI work? give your answer in 1 sentence"}]
       }]
      }'

Which resulted in:

Hurray! You have Gemma 3 running on Google Cloud Run. Now you can use it in your applications. You can add Open WebUI if you like.

The Docker image #

From the surface, it looks partially like magic, but most of the heavy lifting is being done by a prebuilt Docker image available on the Google Cloud Artifact registry (pkg.dev) built with Google Cloud Biuld.

When you read the official docs about Gemma on Cloud Run, you can find out that there are pre-built Docker images like Gemma 3:4B on the package registry. As GPUs on Cloud Run have become available on demand with no reservations needed, deploying any model with Ollama on Cloud Run has become much easier.

The Dockerfile also has a proxy server to add the API key validation on top of a regular Ollama instance. It would be good to go through the readme to know more about this and other features.

In the next section, you will learn about the single command for deploying Gemma 3 on Cloud Run.

How to run it with one command #

To run Gemma 3:4B on Cloud Run, you can run the following command on Google Cloud Shell of your respective Google Cloud Project:

gcloud run deploy gemma3-4b-dwc \
 --image us-docker.pkg.dev/cloudrun/container/gemma/gemma3-4b \
 --concurrency 4 \
 --cpu 8 \
 --set-env-vars OLLAMA_NUM_PARALLEL=4 \
 --set-env-vars=API_KEY=gf2lv74w79ubm5lr \
 --gpu 1 \
 --gpu-type nvidia-l4 \
 --max-instances 1 \
 --memory 32Gi \
 --allow-unauthenticated \
 --no-cpu-throttling \
 --timeout=600 \
 --region us-central1

You can run it on Google Cloud Shell as:

It might ask you do deploy with no zonal redundancy, write y for yes, and it should deploy the service and give you back a URL as follows:

If you hit the URL key with the API key, you can see the Ollama is running message as seen below:

There you have it. Now you know you have decoded the secret of the magic AI studio doing in the background to enable the API key. Gemma 3 is running on Google Cloud Run with Ollama and a proxy server written in Go.

Conclusion #

In this post, you learned about deploying Gemma 3 on Google Cloud Run through Google AI Studio's intuitive interface, which simplifies a complex process into a few clicks. You also learned that this deployment leverages Ollama and pre-built Docker images from the Google Cloud Artifact Registry, enhanced with a Go-based proxy server for API key validation.

Additionally, you could decode the magic using a single gcloud command within the Google Cloud Shell, offering flexibility and control over deployment parameters like concurrency, CPU, GPU, and memory. Both methods result in a functional Gemma 3 instance on Cloud Run, ready to be integrated into applications, providing developers like you with powerful AI model capabilities with minimal effort. Keep experimenting!

Google AI Studio: How to go from a prompt to a geo-location guessing app in minutes

2025-05-22T10:58:57Z

Can you code and deploy a basic but functional app with minimal coding experience? With the latest Google AI Studio feature, you can build and deploy apps by instructing an agent in minutes. You can also deploy the app on Google Cloud Run and make changes easily. This post will show you how. Let’s get started!

Table of contents #

The goal #

Your goal is to build a Gen AI (LLM)- powered application that will take an image (of a popular place) and then give you back the Country, City, state, place, and the geolocation coordinates of that place. A creative use of this can be stalking and knowing where your friends went by using their Instagram photos. Other usages are up to you. This is more of a proof-of-concept fun project to demonstrate the power of LLM and Gen AI with Gemini 2.5 models.

Build an app with a prompt on Google AI Studio #

Go to Google AI Studio, then click on the Build link in left menu as shown below:

Then paste the following in the prompt text area;

Build an app that can guess Geo location from a given image:
You are an OSINT investigator. Your job is to geolocate where the photos are taken.
Provide the country, region, and city name of the location. 
Please pinpoint the exact location with latitude and longitude where the photo was taken.  

Could you always explain your methodology and how you concluded? 
Provide steps to verify your work. 
Also, mention the percentage of how sure you are of the place you have identified it to be 
and add a Google Maps link to the exact location

It will look something like the below:

After that, click the blue Build button. It will use a code agent/assistant like Google Jules AI and start “thinking” and writing the whole application as seen below:

It will take a couple of minutes (or a bit longer), and then it will generate the full app in React and TypeScript. It uses Tailwind CSS for styling. When the app generation is complete, it will look as follows:

The application preview is available on the right panel, so you can upload and test a picture like this one (Statue of Liberty). It will analyze the photo and guess the geo location as follows:

You can even click the provided Google Maps link and see where the place is. You can scroll down to see the methodology and the verification steps as seen below:

In the next section, you will deploy the generated (vibe-coded) app to Google Cloud Run.

Deploy to Google Cloud Run #

Google Cloud Run is a fully managed compute platform that automatically scales your stateless containers. It is a great way to deploy your applications without worrying about the underlying infrastructure.

You will need an existing project and some credits in your Google Cloud Account to deploy to Google Cloud Run. If you do not have an existing project on Google Cloud, you can create a new one as seen below:

It will take some time, and the project will be created; you will be notified of that. In my case, the project name is geo-location-app, and it will look like the following:

To deploy the generated (vibe-coded) application that you have tested. Go back to the Build page of the generated app. In the Preview section on the right, click the Rocket (🚀) black and white icon on the top right corner above Preview:

Then, search and select the project you just created on Google Cloud Console, in my case it is geo-location-app, then click the Deploy app blue button as follows:

It will take some time to deploy the React App on Google Cloud Run, starting with verifying the project:

Then it will show Deploying to Cloud Run and finally give you a link to try out your Geo location guessing app, as seen below:

You can click the App URL link or the View app button to see the app working on a publicly accessible URL, the same as the one you tested in the preview. When you click the View app blue button, you will see the app in action:

You can upload the same Statue of Liberty photo or any other famous landmark and see how it works:

You can also upload a picture of the Eiffel Tower, and it will guess the location as Paris, France. Give it a shot. You can also switch from light to dark theme and back.

Your app is auto-saved and you can find it in the Your Apps tab in the Google AI Studio’s Build page:

Next, you will learn how to make a change and redeploy the app to Cloud Run.

Making a change and redeploy #

You can make a change to the app and redeploy it. In my case, I want to remove the mention of OSINT from the interface, and I will instruct/vibe code the Code assistant section of the app with:

Remove all the mentions of "OSINT" from the user interface, but keep it in the prompt

You can input the new instruction as follows and hit the blue up button as seen below:

It will take some time to do the given task, after some thinking (in my case, 10 seconds), it started editing the files. The app looked like the following after the change was made:

As the change has been made and the app has been tested in preview. You can save the new change by hitting the save icon above the preview. You can redeploy it to Google Cloud Run. To do this, click the Rocket (🚀) black and white icon on the top right corner above Preview and select the same project (it was geo-location-app in my case):

It verifies the project and, given that the app is already deployed in that project, it gives you a Redeploy app link, which you can click to redeploy the change to Google Cloud Run:

It will take some time to redeploy the app, and you will see the Deploying to Cloud Run message.

It will redeploy the app and give you the same link to the app, which you can test and verify that the change has been deployed and released. After you click View app, I can see that the update has been done:

After the change, I tested the app to ensure it worked as expected. There is no mention of OSINT anymore, which is what I wanted. Google Cloud credits are provided for this project. Thanks to Google.

There you have it—you vibe coded a fully functioning app that started with a simple idea and a single prompt. You also deployed it on a publicly accessible URL, which you can share with friends. As it is running on Google Cloud, it will cost you money, so be sure to check your costs on Google Cloud Billing and stop the app if you need to. After testing, you can delete the project with the app running on Google Cloud Run.

In my case, the primary prompt and calling the API were in the services/geminiService.ts file, which was using gemini-2.5-flash-preview-04-17 with a pretty well-crafted prompt. The whole app is available in this open-source GitHub repository. From the code I read, it looks like a frontend-only app, so how it handles the API key and its security is your responsibility.

You can read the FAQs on Google AI Studio Build/Apps page, which tell you that it is a frontend only app and other details about how it runs in an Iframe on Google Cloud Run.

The goal was to have a working app, and you have it, even with a dark and light theme switcher. It is a good starting point, but I would not consider it production-ready; use it cautiously and carefully. If you think something is wrong, delete the app from Cloud Run and/or delete the whole project. I don’t want you bleeding money on Google Cloud Platform.

Conclusion #

In this tutorial, you started with the goal of having a working Gen AI-powered and generated app that can guess the geolocation from a picture. You then generated the full app using Google AI Studio Apps/build feature. After that, you created and deployed a Google Cloud project to Google Cloud Run. Finally, you made changes to remove the OSINT mention from the user interface and redeployed the app to Google Cloud Run in the same Google Cloud Project.

It is pretty easy to go from an idea and prompt to a fully functional app accessible over the Internet and deployed on Google Cloud Run. To manage costs effectively, remember to monitor your Google Cloud billing and stop the app if needed. Keep learning and using Gemini and Google AI Studio!

Cloud Run Jobs: A Beginner's Guide to Running Tasks to Completion on a schedule

2025-04-29T11:38:57Z

With Cloud Run, just bring your code! Google handles the complex server stuff and scaling, so you don't have to. Typically, you could run only web services with a URL on Google Cloud Run as services. For some time now, you can also run Cloud Run Jobs to execute a task to completion, which might take longer than minutes or even hours. In this beginner-friendly post, you will learn how to run jobs on Cloud Run Jobs on a schedule. Let’s get going!

Table of Contents #

Cloud Run Jobs #

Unlike Google Cloud Run services, which are executed when a web server receives a web request (such as a GET or POST), Cloud Run Jobs are specifically designed for tasks that need to run, perform some work, and then terminate once that work is complete. They don't listen for incoming requests like services do.

Cloud Run Jobs are task-oriented, executing a specific piece of work, such as scraping a website, converting a PDF invoice to database records, taking a screenshot, fine-tuning an LLM, or resizing images. They do not require a web server and can be triggered one-off, at a specific time (such as 2 am each morning) or as a part of a GCP Workflow. These jobs run to completion and can run up to 10,000 tasks in parallel. Tasks have an index number and a count of runs; tasks can also be configured to retry.

In essence, if you have a containerized task that needs to run periodically, on demand, or part of a workflow and complete its work without serving requests, and then shut down, Cloud Run Jobs are a suitable serverless option on Google Cloud.

How Cloud Run Jobs handles parallel tasks #

One of the powerful features of Cloud Run Jobs is its ability to handle parallel task execution. This is particularly useful for batch processing workloads where you can divide the work into independent chunks that can be processed concurrently. You can execute up to 1,000 jobs per project per region, as outlined in the Cloud Run quotas.

When you define a Cloud Run Job, you can specify the parallelism setting. This setting determines the maximum number of Task Instances that can run concurrently within a single Job Execution.

Imagine you have a job that requires processing 100 items, and your container is designed to process one item at a time. If you set parallelism to 10, when you execute the job, Cloud Run Jobs will attempt to run up to 10 Task Instances simultaneously. Each Task Instance will receive information about which specific item(s) it should process.

How does a Task Instance know which part of the work to do? Cloud Run Jobs provides environment variables to each Task Instance:

CLOUD_RUN_TASK_INDEX: This variable provides a unique index for each Task Instance within a Job Execution, starting from 0 up to parallelism - 1.
CLOUD_RUN_TASK_COUNT: This variable provides the total number of Task Instances expected for this Job Execution, which is equal to the parallelism setting.

Your container's code can read these environment variables to determine its specific slice of the work. For example, if you have 100 items to process and parallelism is set to 10, Task Instance 0 might process items 1-10, Task Instance 1 processes items 11-20, and so on, up to Task Instance 9 processing items 90-100. Your code would use CLOUD_RUN_TASK_INDEX and CLOUD_RUN_TASK_COUNT to calculate the range of items it's responsible for.

Cloud Run Jobs manages the lifecycle of these Task Instances. If a Task Instance fails (e.g., due to an error in your code or a temporary infrastructure issue), Cloud Run Jobs can be configured to retry that specific Task Instance up to a specified number of times. This ensures that even if some parts of your batch fail, the overall job execution can still complete successfully by retrying the failed tasks.

The ability to run tasks in parallel significantly speeds up the execution of batch workloads. Instead of processing 100 items sequentially in one container, you can process them concurrently across multiple containers, which drastically reduces the total time required for job execution. This is similar to having various workers tackle different parts of a large project simultaneously, rather than having one worker complete everything step by step.

By default, each task runs for a maximum of 10 minutes. You can modify the default value by changing the task timeout setting, up to a maximum of 168 hours (7 days). Support for timeouts greater than 24 hours is available in Preview.

Simple Cloud Run job example #

This tutorial focuses on learning how to create and run Google Cloud Run Jobs on Google Cloud Platform (GCP). The code example you will use is the official Node.js quickstart for Cloud Run Jobs.

You will use the gcloud cli command over Google Cloud Shell to do most of the work, and you will not need to install anything on your local machine. Let’s get started!

The main index.js file has the following contents:

// Retrieve Job-defined env vars
const {CLOUD_RUN_TASK_INDEX = 0, CLOUD_RUN_TASK_ATTEMPT = 0} = process.env;
// Retrieve User-defined env vars
const {SLEEP_MS, FAIL_RATE} = process.env;

// Define main script
const main = async () => {
  console.log(
    `Starting Task #${CLOUD_RUN_TASK_INDEX}, Attempt #${CLOUD_RUN_TASK_ATTEMPT}...`
  );
  // Simulate work
  if (SLEEP_MS) {
    await sleep(SLEEP_MS);
  }
  // Simulate errors
  if (FAIL_RATE) {
    try {
      randomFailure(FAIL_RATE);
    } catch (err) {
      err.message = `Task #${CLOUD_RUN_TASK_INDEX}, Attempt #${CLOUD_RUN_TASK_ATTEMPT} failed.\n\n${err.message}`;
      throw err;
    }
  }
  console.log(`Completed Task #${CLOUD_RUN_TASK_INDEX}.`);
};

// Wait for a specific amount of time
const sleep = ms => {
  return new Promise(resolve => setTimeout(resolve, ms));
};

// Throw an error based on fail rate
const randomFailure = rate => {
  rate = parseFloat(rate);
  if (!rate || rate < 0 || rate > 1) {
    console.warn(
      `Invalid FAIL_RATE env var value: ${rate}. Must be a float between 0 and 1 inclusive.`
    );
    return;
  }

  const randomFailure = Math.random();
  if (randomFailure < rate) {
    throw new Error('Task failed.');
  }
};

// Start script
main().catch(err => {
  console.error(err);
  process.exit(1); // Retry Job Task by exiting the process
});

The main things you need to understand here are:

The script executes with the main call on line no. 49, which has a catch attached to it. It would have been easier with a JavaScript try catch block, but this is the official example. So, let’s just roll with it.
Before the main function on line 7, four environment variables are taken out of the process.env with object destructuring. Those are CLOUD_RUN_TASK_INDEX and CLOUD_RUN_TASK_ATTEMPT, which are used to determine the task's position within the job and the retry attempt (if the job has failed).
The other two are SLEEP_MS and FAIL_RATE, which are used to sleep for a given number of milliseconds and specify the failure rate as a percentage, respectively. For example, 0.1 represents 10% and 1.0 represents 100%, which will never work.
The main task (in the job is the main function), which logs the task number and the attempts first.
Then it waits for the specified milliseconds and, next, randomizes the failure according to the failure rate.
Lastly, it logs the index of the computed tasks.
Below that, there is a sleep function to just block the process and wait for the specified number of milliseconds.

Next, you will see how to add it as a Cloud Run job using the Google Cloud Shell.

Create a Cloud Run Job using Google Cloud Shell #

To get started, you will need some knowledge of Git, GitHub, and Google Cloud Shell.

Given that you are logged into your Google account. First, you will create a new Google Cloud Project on this page. You can name it cloud-run-jobs as seen below:

Ensure that you copy the Project ID, as you will need it later, and then click the Create button at the end of the form. It will take a couple of minutes, and you will be taken to the dashboard of the new project as follows:

Ensure that the recently created project is selected; alternatively, you can click Select Project from the notification message.

After that, click on the Cloud Shell icon besides the bell icon on the top right (or press g and s, on your keyboard). The Cloud Shell will appear at the bottom of the screen and ask for permission. Click Authorize:

In the shell, type in (or copy and paste) the following command:

mkdir projects && cd projects && git clone https://github.com/geshan/cloud-run-jobs.git && cd cloud-run-jobs

The command will create a directory called projects, go into that folder and clone the demo repository with the above code, and go into the cloud-run-jobs repository folder as follows:

After that, set the project ID variable to the variable you copied while creating the project (or copy it from your Google Cloud CLI), it should look like cloud-run-jobs-<some-numbers-here>, which in my case was cloud-run-jobs-458310. The command is below:

export PROJECT_ID=cloud-run-jobs-<some-numbers-here-replace-this> && echo $PROJECT_ID

Now, to create the Cloud run job after building from source with a build pack (not using a Docker container), run the following command on your Google Cloud Shell:

gcloud run jobs deploy job-quickstart \
    --source . \
    --tasks 5 \
    --set-env-vars SLEEP_MS=5000 \
    --set-env-vars FAIL_RATE=0.1 \
    --max-retries 3 \
    --region us-central1 \
    --project=$PROJECT_ID

Here you are asking Google Cloud Command to create a Cloud Run job called job-quickstart , where the code is taken from the local directory .. You are specifying that the job has five tasks and will sleep for 5 seconds (5000 milliseconds) with a 10% failure rate (0.1) and can be retried up to 3 times on failure.

For the region you are using us-central1, and the project ID is the same as you set in the previous command. It looks like the below when it executes:

It will ask you to enable APIs, hit Y to enable all related APIs, as seen below:

If you encounter an Error, it may be because the APIs require some time to become enabled. So, try again after a couple of minutes. When it works, it will look something like the following:

It will take some minutes for the code to be copied to a bucket and then built with build packs on Google Cloud Build. If you want to follow the build process, you can open the visible URL on a new browser tab, ti will look something like this:

When it is successful, it will show you something like the following:

At this point, search for Cloud Run and click the first result:

After that, click the Jobs tab, and you will see your Cloud Run job has been created:

But this job has never run, to run the job on demand, execute gcloud run jobs execute job-quickstart --region=us-central1 on the Google Cloud Console as seen below:

You will see the job run. If you click Running and wait for the task to execute, you will see that all five tasks in the job have run successfully:

You can also view logs of each task if you want:

Create a job schedule with Google Cloud Scheduler #

To run jobs on a schedule, please follow the guide, which outlines the process of scheduling jobs using Google Cloud Scheduler.

It can be done by running the following command on Google Cloud Shell to run the job every 15 minutes:

gcloud scheduler jobs create http quickstart-schedule \
  --location us-central1 \
  --schedule="*/15 * * * *" \
  --uri="https://us-central1-run.googleapis.com/apis/run.googleapis.com/v1/namespaces/cloud-run-jobs-458310/jobs/job-quickstart:run" \
  --http-method POST \
  --oauth-service-account-email 944865726665-compute@developer.gserviceaccount.com

You can find the project number on the project dashboard page, you will need to modify the command to your project id and number. It looks like the below when run:

If you go to your cloud scheduler page for the project, you will find the schedule you just created:

You can do a Force Run selecting the option from the ... on the right side of the schedule. I tried it and it worked for me. I waited for a few minutes so that it would be triggered again on time.

That's it; it is best to delete the schedule and the job if you're testing it out.

Conclusion #

In this post, you learned about Google Cloud Run Jobs and their usage. You also learned how to create and run Cloud Run jobs on a schedule with Google Cloud Schedule with a simple Node.js Example. I hope you can utilize Cloud Run Jobs to execute tasks efficiently. Keep learning!

How to deploy a container image to Amazon Elastic Container Service (ECS) with Fargate: a beginner’s tutorial [Part 2]

2025-03-26T11:48:57Z

Amazon Elastic Container Service (ECS) is a fully managed container orchestration service that simplifies the deployment, management, and scaling of containerized applications on AWS. It manages containers without the need to learn Kubernetes. With Fargate, resource management can also be serverless. In this post, you will learn how to deploy a built container image from Amazon Elastic Container Registry (ECR) to Amazon Elastic Container Service (ECS) provisioned with Fargate. The goal is to do the bare minimum to get a URL/IP from a container image on ECR (image built and pushed in part 1 of this series), let’s get going!

Table of contents #

What is Amazon Elastic Container Service (ECS) #

Amazon ECS is a fully managed container orchestration service that helps you to more efficiently deploy, manage, and scale containerized applications. You can provision the underlying resource with Fargate or Elastic Compute 2 (EC2) instances. With Fargate, you can use Amazon ECS to run containers without having to manage servers or clusters of Amazon EC2 instances.

Below is a diagram of how Amazon ECS with Fargate fits in the pipeline with AWS CodePipeline, ECR, and other services including Docker in the mix:

You can also use the AWS CLI to create the cluster, service, and task definition. For this tutorial, however, you will use the AWS console UI to keep things simple.

Create an Elastic Container Service cluster #

First, you must create an Amazon Elastic Container Service (ECS) cluster to deploy your Docker Image. For this, after logging in to your AWS console with a user having the correct IAM permissions, search for ecs on the search bar and click on the Clusters link under Top Features as seen below:

After the clusters listing page loads, click on the Create Cluster orange button on the top right of the page:

Then, in the form that loads, type in the cluster's name. I am using dev-cluster as an example. In the Infrastructure section, make sure AWS Fargate (serverless) is checked, as follows:

Don’t change any of the other optional settings and scroll to the bottom of the form, where you will see the Create button, click that:

It might take a couple of minutes for the cluster to be created, and it will show up on the Clusters listing page as seen below:

Hurray! Your ECS Cluster with Fargate has been created. In the next section, you will create a task definition.

Create a task definition for ECS #

A task is your application's blueprint. It can be created from a JSON file or the AWS Web UI. There are many parameters for a task, but you will only focus on the important ones for the tutorial's scope.

To create a task definition, click on Task definitions as seen in the previous screenshot, it will take you to the Task definitions listing page. Here, click on the Create new task definition orange button and select Create new task definition as seen below:

On the task definition form, put in the Task definition family as nodejs-apps and make sure you have AWS Fargate selected in the Launch type of Infrastructure requirements :

In the task size section, select CPU as .5 vCPU and Memory as 1 GB. As we are running a simple Hello World Node.js application, these resources would be more than enough. Then, select the Task role and Task execution role as ecsTaskExecutionRole.

Now scroll down to the Container-1 section and name the app as hello-world, for the Image URL part, copy the image URI you pushed in part 1 of this series from the container registry page as shown below:

Then paste the URI in the Image URI field, as it is a single container task; this container will be Essential Container - Yes. In the port mapping section, expose port 3000 in the Container port field with the Protocol being TCP, name the port nodejs-3000, and keep App Protocol as HTTP as selected. It will look like the following:

Scroll down to the Environment variables and add an environment variable called PORT with the value 3000:

After that, scroll to the bottom of the form and click the Create button:

You will see the service definition has been created:

Until now, you have only created a service definition, not a service, so no containers are running. In the next section, you will create a service with a task that will bring up the container.

Create an ESC Service #

To create a service, click Clusters on the previous screenshot and then click on the cluster name, which should be dev-cluster. In the cluster detail page, on the Services tab, click the Create button on the bottom right of the page to create a service:

On the create service form, select the Compute options as Launch type with FARGATE as Launch type and the Platform version as LATEST:

In the Deployment Configuration section, Application type would be pre-selected as Service, in the Family field, select in nodejs-apps and select the Revision to be LATEST. Then name the service hello-world-service, then leave the other settings as-is like Replica has Desired tasks of 1:

Then scroll down to the Networking section, this is the important part. Expand it, make sure the VPC is selected as is. In the Subnets section click Clear current selection and from the drop-down, choose only one subnet that has us-east-1a .

In the Security group section, choose Create new security group. In the Security group name field, type in port-3000-open-from-anywhere. Similarly, type in Open port 3000 from anywhere in the Security group description field.

After that, in the Inbound rule for security groups part, choose Customized TCP as Type; in the Port Range field, type in 3000, and select' Anywhere' for the source field. Also, make sure that the Public IP is Turned On:

After that, scroll to the bottom of the form and click Create to create the service.

It will take some time for the service to come up, you can click on the Service name hello-world-service:

Then, on the Tasks tab on the service page, click on the task ID:

On the task page, click on the Networking Tab and click the open address beside the Public IP as shown below:

When the IP opens in a new tab browser (if it is Chrome, allow the tab to load it insecurely without HTTPs), then append :3000 to the IP as the Hello World Node.js app is set to run on 3000 with the PORT environment variable, you should see the app run by printing Hello World! on the browser:

Congratulations, your Node.js Hello World container is now running on ECS with Fargate. You should read about the difference between an ESC Task and a service.

Important note #

This is a simple example: in a real-life, production-ready application, you would have added a Load Balancer and some DNS records. You would also configure the Security groups, Subnets, VPCs, and IAM settings much more precisely.

You would have written some form of CI/CD pipeline to deploy the new changes automatically. You would have also added some monitoring and logging to the application.

Delete the cluster, service definition, and container registry if you don’t need them anymore.

Conclusion #

As a recap, you learned about Amazon Elastic Container Service (ECS) and how to provision it with Fargate to have serverless resources. Then, this tutorial taught you how to create an ECS task definition. After that, you deployed a service with the correct parameters to expose the service via a public IP without using a Load Balancer. Keep updating your AWS knowledge!

How to create an Amazon Elastic Container Registry (ECR) and push a docker image to it [Part 1]

2025-03-25T11:41:57Z

Amazon Elastic Container Registry (ECR) is a fully managed container registry that can store (Docker) container images, making it easy to pull, share, and deploy container images. This post will teach you how to create a private Amazon ECR and push Docker container images of a simple Node.js Hello World app with Express. Let’s get started!

Table of contents #

What is Amazon ECR #

Amazon Elastic Container Registry (ECR) is an Amazon Web Service (AWS) service that stores and manages Docker images, making it easy to pull and deploy them to services like Elastic Container Service (ECS), Elastic Kubernetes Service (EKS), and/or AWS Lambda functions. Below is a quick overview of how it works from the official website.

Amazon ECR also has commands in the AWS CLI to do everything you can from the user interface. For this post, you will use the user interface to build and push a Docker image for a simple Node.js Express Hello World app.

The Node.js Hello World app #

It is assumed that you have Node.js 20+ installed on your local machine. If not, you can clone this repository in your AWS CloudShell. AWS CloudShell has Node 20 installed by default.

The Node.js Hello World app used for this tutorial is an Express 5.x app. You can start a Node.js app with npm init, and to install Express 5.x, you can run npm install express@next. Then, in the folder that has the package.json and package-lock.json, you can create an index.js file with the following code:

const express = require('express');
const app = express();
const port = process.env.PORT || 80;

app.get('/', (req, res) => {
  res.send('Hello World!');
});

app.listen(port, () => {
  console.log(`Example app listening on port ${port}`);
});

It is a simple Node.js app with only one route / that prints Hello World using Express. If you run node index.js , you will see the following:

> node index.js
Example app listening on port 80

After that, if you hit http://localhost on your favorite browser, you will see the text Hello World on the browser tab.

The next important file in the repository is the Dockerfile with the following content:

#buld stage
FROM public.ecr.aws/docker/library/node:22 AS build

WORKDIR /srv
COPY package*.json ./

# Install dependencies based on the `package.json` and `package-lock.json`
# files in the host folder
RUN npm ci --omit=dev


# Production stage, only includes what is needed for production
FROM public.ecr.aws/docker/library/node:22-alpine

ENV NODE_ENV=production
USER node

COPY --from=build /srv .
ADD . .

# Specify the command to run when launching the container
EXPOSE 80
CMD ["node", "index.js"]

It is a Dockerfile using Docker multi-stage build to create the build and the production stage.

The easiest way to get started is to clone this repository in your AWS Cloud Shell and build the image there. Before that, you will first create the private Amazon ECR.

Create private Amazon ECR #

Amazon Elastic Container Registry (ECR) can be public or private. For example, this Node.js docker image is being served from the public ECR. No authentication is required to pull the node image. Conversely, the apps you develop will to be accessible to the public, so they are in a private ECR, which has some form of access control to ensure only the right users have access to it.

To create a private Amazon Elastic Container Registry (ECR), you must be logged into your AWS account and have the correct IAM permissions. This post does not cover the IAM permissions. In your AWS console, search for ecr as shown below:

Then click on Elastic Container Registry which will take you to the following page:

On this page, click on the Create yellow button, which will take you to the form below:

On the form, fill in the repository name as nodejs/hello-world, where nodejs is the namespace and hello-world is the app name.

It is better to keep the tags immutable so they are not overwritten. To do this, click the Immutable radio button in the Image tag mutability section. After that, keep the Encryption settings as is and click the Create button at the bottom of this page.

It might take some seconds for the EC registry to be created, and you will be taken to the Private repositories listing page as seen below:

In the next part, you will learn how to build and push a Docker container image using AWS Cloudshell. You are using AWS Cloudshell, so there is nothing to install on your machine. You can also do the same using your machine.

Build and push Node.js Docker image #

To build and push the Node.js Hello World (with Express.js) Docker image, you must first go to the repository by clicking its name, as seen in the previous screenshot.

You will land on the Images page of the repository, which will look like the below:

Before doing anything else, please click the CloudShell link at the bottom left side of the page. It will provision and run a shell for you:

Type in the following command in the shell:

mkdir projects && cd projects && git clone https://github.com/geshan/nodejs-aws-ecs-fargate.git && cd nodejs-aws-ecs-fargate

The above command creates a folder called projects and then goes into it. After that, it clones the above Node.js (Express) Hello World App, which also has the abovementioned Dockerfile. Subsequently, the last command takes you into the nodejs-aws-ecs-fargate folder.

It will result in the following state:

After that, click on the View push commands button on the left side of the page it will show a pop-up as seen below:

Copy all four visible commands and paste them into Notepad or your IDE. The commands I got were:

aws ecr get-login-password --region us-east-1 | docker login --username AWS --password-stdin <long-id-here>.dkr.ecr.us-east-1.amazonaws.com

docker build -t nodejs/hello-world .

docker tag nodejs/hello-world:latest <long-id-here>.dkr.ecr.us-east-1.amazonaws.com/nodejs/hello-world:latest


docker push <long-id-here>.dkr.ecr.us-east-1.amazonaws.com/nodejs/hello-world:latest

Then, run them individually on your console, maintaining the order. After the first command, you will see Login Succeeded. That command is used to log in to the Elastic container registry.

The docker build command will take up to 5 minutes, wait for it. After the build is done, it will look something like the below:

The third command tags the built container with nodejs/hello-world:latest , which results in nothing. The fourth command pushes the image to the private Amazon ECR. If everything goes well, when it finishes, it will look like the following:

After that, close the cloud shell and refresh the Images page of the private Amazon ECR. You will see the pushed Docker image in the registry as follows:

You can see the image’s details by clicking the latest link, which will show you:

You can deploy the image's URL as a container in ECS, EKS, or even a Lambda function. The next part of this blog post series you will deploy the built and pushed container to Amazon Elastic Container Service (ECS).

Conclusion #

In this post, you learned about Amazon Elastic Container Registry (ECR), a service in Amazon Web Services (AWS). Then, you built a simple Hello World Node.js app with Express 5.x. After that, using the AWS interface, you created a private AWS ECR Docker image registry. Consequently, look at the command provided. You used AWS CodeShell to build and push the Hello World Node.js app image after cloning the repository from GitHub.

I hope you learned the basics of Amazon ECR, in the next part, you will deploy the container on Amazon Elastic Container Service (ECS) using Fargate for serverless resource provisioning. Keep learing!

How to use Ollama and Open WebUI with Docker Compose [Part 4]

2025-02-11T11:43:57Z

Ollama gives you one of the easiest ways to run most open LLMs on your machine. It is open-source and easy to use. In addition to using it with a command line or its APIs, you can use it with a web user interface using Open WebUI. This post will teach you how to run Ollama and Open WebUI to run any open LLM with a web-based chat interface like ChatGPT. Let’s get started!

Table of contents #

Recap of the Ollama series #

This is part 4 of the Ollama blog series. In the first part, you learned what an Ollama is, its features, and how to run it on your local machine.

The second part delved into the Ollama commands you can execute on the CLI. Part 3 of the series shed light on some of the important Ollama APIs focusing on the generate and chat endpoints.

This part involves running Ollama’s Docker image and adding a web UI, the Open WebUI, to provide a chat interface for any model Ollama can run. Like Ollama, Open WebUI is also open-source, with the code primarily in JavaScript, Python, and TypeScript. It also has a docker image pushed on the Google Container Registry, created from its Dockerfile. You will use Docker Compose to run these two images together for a working application.

Prerequisites #

Before you start running some Docker Compose commands, be informed of some of the software that needs to be running on your machine:

You will need Docker running on your machine, for this example, I am using Docker 27.4.0 on Mac
Make sure you have Docker Compose available as well (it used to be a different install when it was docker-compose when it was in v1, from v2 it is coupled with the Docker Desktop installation). I am using Docker Compose version v2.31.0-desktop.2 on a Mac)
It would be good to know about Docker volumes, docker ports, and basic docker commands

You can read the Docker for beginners tutorial for a refresher on Docker. Please read this docker compose tutorial to learn more about Docker Compose.

Open WebUI #

Open Web UI is a user interface for interacting with large language models. It offers a streamlined and intuitive way to communicate with and manage these models, making them more accessible and user-friendly.

Open Web UI aims to simplify working with large language models. It allows users to harness their power for various applications, including content creation, research, and software development.

Ollama Docker Compose #

The Docker images for both Ollama and Open WebUI are not small. Ollama’s latest (version 0.5.7 at the time of writing) is 4.76 GB uncompressed, and Open WebUI’s main tag is 3.77 GB uncompressed. Below is the docker-compose.yaml file that has both Ollama and Open Web UI:

services:
  ollama:
    image: ollama/ollama:latest
    ports:
      - 11434:11434
    volumes:
      - ollama:/root/.ollama
    container_name: ollama
    tty: true
    restart: unless-stopped

  open-webui:
    image: ghcr.io/open-webui/open-webui:main
    container_name: open-webui
    volumes:
      - open-webui:/app/backend/data
    depends_on:
      - ollama
    ports:
      - 3000:8080
    environment:
      - 'OLLAMA_BASE_URL=http://ollama:11434'
      - 'WEBUI_SECRET_KEY='
    extra_hosts:
      - host.docker.internal:host-gateway
    restart: unless-stopped

volumes:
  ollama: {}
  open-webui: {}

This file sets up your local environment to run any AI model (such as a large language model or LLM) and interact with it through a user-friendly web interface. It's like setting up a mini-cloud service on your machine.

This docker-compose.yml file sets up a two-part application:

Ollama runs large language models (LLMs) locally on your computer. Think of it like the "engine" that powers the AI. It's like having your mini-ChatGPT running.
Open WebUI is a user-friendly web interface that allows you to interact with Ollama. It's like a dashboard that allows you to talk to the AI engine. It provides a nice visual way to send prompts and see responses.

Ollama Open WebUI Docker Services #

Let’s look at the services section of the above docker-compose.yaml file:

Services is the main section where you define your application's different parts (containers). Each "service" is a separate program running in its isolated environment.

ollama: This defines the first service, named "ollama".
image: ollama/ollama:latest: This tells Docker which pre-built "image" to use. An image is like a template for a container. ollama/ollama:latest means you using the official Ollama image, and latest means we want the most recent version.
ports: - 11434:11434: This maps port 11434 on your host machine (your computer) to port 11434 inside the Ollama container. Ollama listens for requests on port 11434. This allows other applications (like Open WebUI) to talk to Ollama.
volumes: - ollama:/root/.ollama creates a persistent storage area. /root/.ollama is where Ollama stores its data (like downloaded models). ollama: (defined at the bottom of the file) is a named volume. This means the data will persist even if you stop and restart the container. Without this, you'd lose all your downloaded models every time you stopped Ollama. It's like giving Ollama a dedicated hard drive that doesn't get erased.
container_name: ollama: This gives the container a specific name, "ollama," making it easier to refer to.
tty: true allocates a pseudo-TTY, which can be helpful for interactive sessions. It helps the container handle input and output, making it behave more like a regular terminal. Programs that expect to interact with a user often need this.
restart: unless-stopped: This tells Docker to automatically restart the Ollama container if it crashes or stops for any reason unless you explicitly stop it yourself (e.g., using docker compose down). It's like setting an auto-restart feature.
open-webui: This defines the second service, named "open-webui".
image: ghcr.io/open-webui/open-webui:main: This uses the Open WebUI image from the GitHub Container Registry (ghcr.io). main specifies a particular version (the main branch).
container_name: open-webui: Gives the container a specific name.
volumes: - open-webui:/app/backend/data: Similar to Ollama, this creates persistent storage for Open WebUI's data. /app/backend/data is where Open WebUI stores its data. open-webui: is another named volume. This keeps your Open WebUI settings and data safe.
depends_on: - ollama: This is crucial. It tells Docker Compose that the Open WebUI service depends on the Ollama service. Docker Compose will start Ollama before starting Open WebUI. This is essential because Open WebUI needs Ollama to run and function. It's like saying, "Don't start the dashboard until the engine runs”. Read more about Docker compose depends on.
ports: - 3000:8080 This maps port 3000 on your host machine to port 8080 inside the Open WebUI container. Open WebUI runs on port 8080. This means you'll access the Open WebUI interface by going to http://localhost:3000 in your web browser.
environment: This sets environment variables inside the Open WebUI container. These are configuration settings.
OLLAMA_BASE_URL=http://ollama:11434: This tells Open WebUI where to find Ollama. Notice it's using the service name ollama (not localhost). Docker Compose sets up internal networking so services can communicate using their service names. This is how Open WebUI knows how to connect to the Ollama "engine."
WEBUI_SECRET_KEY=: This is a security setting for Open WebUI. You should set it to a strong, random value for production use. It's like a password for the web interface. Leaving it blank is fine for local testing but not for a public-facing server.
extra_hosts: as - host.docker.internal:host-gateway: This is a bit more advanced. It allows the container to access services running on your host machine. host.docker.internal is a special hostname that resolves to your host's internal IP address. This is useful if, for example, you have another service running directly on your computer (not inside a container) that Open WebUI needs to access.
restart: unless-stopped: As with Ollama, this ensures that Open WebUI restarts automatically unless you manually stop it.
volumes: This section defines the named volumes used above. Volumes persist data even if the containers are restarted.
ollama: {} defines the ollama volume. The empty {} means we're using the default Docker volume driver.
open-webui: {} defines the open-webui volume using the default driver where Docker manages where and how to save it.

This docker-compose file sets up a system with Ollama (likely a large language model server) and Open-WebUI (a web interface to interact with Ollama). It ensures that Ollama starts first, that both services have persistent storage, and that Open-WebUI knows how to connect to Ollama. You'll be able to access Open-WebUI on your computer at port 3000. Remember to set a WEBUI_SECRET_KEY!

Running Ollama and Open WebUI with Docker compose #

To run the above Docker Compose file, please execute:

docker compose up

Or you could run docker-compose up depending on the version of Docker Compose installed on your machine. Running this command for the first time will take some time, depending on your internet speed, because it will download around 4 GB of data in total (2.5 GB for Ollama and 1.5 GB or a bit more for Open WebUI). So you can make your coffee now and come back with it when the download finishes:

After it downloads both the docker images and runs them, you will see something like the below on the CLI:

Now you can go to http://localhost:3000 on the browser of your choice (probably Google Chrome), and you will see the following welcome screen of Open WebUI:

Click the Get Started link, and then you will need to fill out the form as shown below:

After you fill out the form, you will reach the Open WebUI Dashboard with an announcement:

Click on Ok let’s go to see the Open WebUI main screen. As no models are downloaded, you will download the smollm2:135m model using the UI. This can also be done from the CLI with docker compose exec ollama ollama pull smollm2:135m, but you will use the UI for now.

To pull/download the model onto your local Ollama instance, click the Select a model drop down and type in smollm2:135m then click on Pull smollm2:135m from Ollama.com to download the model as shown below:

It is a relatively small model at 271 MB, so depending on your internet speed, it will finish in seconds or a couple of minutes as follows:

After the model is downloaded locally on your machine and in the Ollama instance, you can start chatting or prompting the model. You can ask questions like who are you? or why is the sky blue? give the shortest possible answer in under 20 words as seen below:

The model will reply. You can also configure the models by clicking the settings icon at the top right of the screen. Parameters like temperature, top K, Top P, and others can be changed on the Open WebUI configs as follows:

You now have your mini chatGPT running locally. Since it is Ollama and the model has been downloaded, it can run even without the internet on a plane. Depending on the resources available, such as disk space, CPU/GPU, and memory, you can download other models, such as Llama, Microsoft Phi, Gemma 2, or DeepSeek, from Ollama’s model registry.

The docker images are huge #

The uncompressed Docker image for Ollama is 4.5 GB, which will grow bigger when you download a model. Similarly, the uncompressed image for Open WebUI is 3.77 GB. Both of them are huge, as you can see below:

Make sure to have at least 9-10 GB of free space on your hard disk before downloading these large Docker images.

Hosting Ollama on the cloud #

You can follow this step-by-step tutorial to run Ollama on Google Cloud Run. If you are looking for a more production-ready Ollama docker image with a model (Gemma 2:9b) already pulled, have a look at this Dockerfile. You can easily change the version of Ollama and also download another model of your choice to host it on Google Cloud Run in serverless containers. You can follow this codelab to create a multi-container Cloud Run service with Ollama and Open WebUI together on Google Cloud Run where Open WebUI is the main pod (ingress frontend) and Ollama is a sidecar.

As it is a Docker container, you can also run it on your Kubernetes cluster on GKE.

Conclusion #

This blog post explains how to run Ollama and Open WebUI with Docker Compose. Ollama is an open-source tool for running large language models (LLMs) on your machine, and Open WebUI provides a web-based chat interface for interacting with the models.

The blog post first recaps the Ollama blog series and lists the prerequisites for running Ollama and Open WebUI with Docker Compose. It then explains Open WebUI and provides the Ollama Docker Compose file. Next, it explains the Ollama Open WebUI Docker services and how to run them with Docker Compose. It also notes that the Docker images are large, and the post provides guidance for hosting Ollama on the cloud. Keep exploring!

Using Ollama APIs to generate responses and much more [Part 3]

2025-02-09T11:47:32Z

Ollama is open-source software that makes running most open LLMs seamlessly on your own machine (or even on the cloud). Written in Go lang, Ollama is user-friendly and easy to start. In this post, part 3 of the Ollama blog posts series, you will learn about using Ollama’s APIs for generating responses (LLM inference) and much more; let’s get going!

Table of contents #

Quick review of the Ollama series #

This blog post is part 3 of the Ollama series. In the first part, you learned about what Ollama is, including its features and also how to run Ollama on your local machine with a couple of models.

In part two, you explored some useful Ollama commands, like ollama serve to start Ollama and serve available models, ollama run to pull (download) and run a model.

In this part, you will learn about the Ollama APIs. In addition to the inference API endpoints /api/generate and /api/chat, you will also learn about other useful API endpoints. The Ollama commands call these APIs behind the scenes to provide the outputs.

In the next part, part 4, you will learn how to run Ollama in Docker with Docker Compose. You will also add Open WebUI with Docker Compose to have a WebUI to interact with LLMs running on Ollama.

Curl and Jq #

For this guide, you will use cURL to call the APIs with Jq. To install JQ, follow their official download guides for instance, on a Mac, you can run brew install jq, similarly on an Ubuntu machine, you can execute sudo apt-get install jq on a Windows machine, you can use chocolatey. Or you can get the binary and make it executable, too.

You can use other programming languages, such as Python and JavaScript, with the official libraries for Python and JavaScript, and frameworks like LiteLLM or LangChain to call the Ollama APIs.

Ollama API Endpoints #

There are more than 10 Ollama API endpoints. This tutorial will focus on some of the most important ones. To use the APIs, you will need Ollama to run either with ollama serve or as a service. You will also need at least one model pulled for the API calls.

For this guide, you will use smollm2:135m, one of the smaller LLMs at 221 MB. You can use any bigger model if it runs on the available resources. The reason to choose smollm2:135m is because most machines, even with 512 MB or memory, can run it.

To start, run ollama serve, in another CLI tab, and run ollama pull smollm2:135m. If you have pulled smollm2:135m from the previous parts of this tutorial series, the download will be very fast as the files already exist.

Generate endpoint #

As the name points out, this API endpoint is available at /api/generate you can POST data to generate a response from the selected model for the provided prompt. It responds as a stream by default; you can configure it to return all the responses in one go and not as a stream. It has four parameters:

model: the model's name, for example, smollm2:135m, is a required parameter.
prompt: is the prompt you want to send to the selected model like Why is the sky blue?
suffix: can be used if you want to append some text after the response
images: a list of base64-encoded images if you want to use it with multimodal models like Llava or SmolVLM.

There are other advanced parameters, too, like stream, which you can read about in the official docs. For now, you can run the following curl command to get a response from smollm2:!35m model:

curl http://localhost:11434/api/generate -d '{
  "model": "smollm2:135m",
  "prompt": "Why is the sky blue? Give the shortest answer possible in under 20 words",
  "stream": false
}' | jq .

In the above curl command, you are using the /api/generate API endpoint and asking about the smollm2:135m model. Why is the sky blue? Give the shortest answer possible in under 20 words, and ask Ollama not to stream the output, so give the full answer in one go. Then, the output is piped to jq. This gives the following output:

{
  "model": "smollm2:135m",
  "created_at": "2025-02-08T11:02:55.115275Z",
  "response": "The sky's clear and deep color due to tiny oxygen molecules (O2) scattering the sun's light allowing us to perceive it as blue.",
  "done": true,
  "done_reason": "stop",
  "context": [
    //a long array of numbers here
  ],
  "total_duration": 302810709,
  "load_duration": 13315375,
  "prompt_eval_count": 47,
  "prompt_eval_duration": 132000000,
  "eval_count": 30,
  "eval_duration": 156000000
}

If you were calling this API from another script or software system, you would be more concerned about the response column. If you just want to see the response from the model, you can use jq for that in the following way:

curl http://localhost:11434/api/generate -d '{
  "model": "smollm2:135m",
  "prompt": "Why is the sky blue? Give the shortest answer possible in under 20 words",
  "stream": false
}' | jq '.["response"]'

You are asking jq to show only the response attribute from the JSON response rather than the whole response. It will look as follows:

You can make many types of requests on the generated API endpoint. One useful one is structured output, where you can specify the output structure as a JSON object. Using the seed option, you can have reproducible outputs. If you provide an empty prompt, the model is loaded into memory. It would be advisable that you read the official docs about the generate endpoint.

Chat endpoint #

The chat endpoint available at /api/chat, which also works with POST, is similar to the generate API. It generates the next message in a chat with a selected model. It is a streaming endpoint that will have a series of responses. You can turn off the streaming with the ”stream::false parameter as seen below:

curl http://localhost:11434/api/chat -d '{
  "model": "smollm2:135m",
  "stream":false,
  "messages": [
    {
      "role": "user",
      "content": "Why is the sky blue? Give the shortest answer possible in under 20 words"
    }
  ]
}' | jq .

The output of the above cURL will be similar to the following:

{
  "model": "smollm2:135m",
  "created_at": "2025-02-08T11:22:15.229839Z",
  "message": {
    "role": "assistant",
    "content": "The sky appears blue because when sunlight passes through Earth's atmosphere, it contains tiny molecules like water vapor and oxygen. These molecules scatter shorter wavelengths of light (such as blue and violet) more than the longer wavelengths (like red and orange), making the sky appear blue to our eyes. Thank you for pointing this out!"
  },
  "done_reason": "stop",
  "done": true,
  "total_duration": 407998750,
  "load_duration": 6202542,
  "prompt_eval_count": 47,
  "prompt_eval_duration": 36000000,
  "eval_count": 65,
  "eval_duration": 363000000
}

As seen in the above response, the structure of the response is a bit different than the generate API endpoint. This one has a message attribute, which is an object, and a content attribute inside it. If you want to extract the content from the response using Jq, you can run the following command:

curl http://localhost:11434/api/chat -d '{
  "model": "smollm2:135m",
  "stream":false,
  "messages": [
    {
      "role": "user",
      "content": "Why is the sky blue? Give the shortest answer possible in under 20 words"
    }
  ]
}' | jq '.["message"]["content"]'

The extracted content from the response of the /api/chat will look as follows:

You can do structured outputs with the chat endpoint. Being a chat endpoint, you can send in the history of the conversation to the the endpoint. For all the other types of reqeust, you can send to this /api/chat endpoint, it would be best to go through the official Ollama docs about it.

List models #

To list the local models available, you can call the /api/tags endpoint, which works with a GET request. This endpoint lists models available locally that you can send as the model name parameter in the generate and chat endpoints. Below is an example call of this API:

curl http://localhost:11434/api/tags | jq .

It will output a JSON similar to the one seen below:

{
  "models": [
    {
      "name": "smollm2:135m",
      "model": "smollm2:135m",
      "modified_at": "2025-02-08T15:33:44.760304367+11:00",
      "size": 270898672,
      "digest": "9077fe9d2ae1a4a41a868836b56b8163731a8fe16621397028c2c76f838c6907",
      "details": {
        "parent_model": "",
        "format": "gguf",
        "family": "llama",
        "families": [
          "llama"
        ],
        "parameter_size": "134.52M",
        "quantization_level": "F16"
      }
    },    
    {
      "name": "qwen2.5:0.5b",
      "model": "qwen2.5:0.5b",
      "modified_at": "2025-02-06T15:59:09.320362549+11:00",
      "size": 397821319,
      "digest": "a8b0c51577010a279d933d14c2a8ab4b268079d44c5c8830c0a93900f1827c67",
      "details": {
        "parent_model": "",
        "format": "gguf",
        "family": "qwen2",
        "families": [
          "qwen2"
        ],
        "parameter_size": "494.03M",
        "quantization_level": "Q4_K_M"
      }
    },
    {
      "name": "deepseek-r1:8b",
      "model": "deepseek-r1:8b",
      "modified_at": "2025-02-02T13:33:29.069046707+11:00",
      "size": 4920738407,
      "digest": "28f8fd6cdc677661426adab9338ce3c013d7e69a5bea9e704b364171a5d61a10",
      "details": {
        "parent_model": "",
        "format": "gguf",
        "family": "llama",
        "families": [
          "llama"
        ],
        "parameter_size": "8.0B",
        "quantization_level": "Q4_K_M"
      }
    }
  ]
}

I have these three models available on my machine. Your response might be slightly different, but the output structure will remain the same.

Pull a model #

To download a new model from Ollama model registry you can use the /api/pull API endpoint that works with a POST call. As the official doc states, canceled pulls are resumed, and in case of multiple pull calls they will share the same download progress.

A model name is required to pull a model, and you can choose to stream or not stream the response. Below is an example of calling the /api/pull endpoint without streaming to pull/download the snowflake-arctic-embed:22m, which is an embedding model at 46 MB:

curl http://localhost:11434/api/pull -d '{
  "model": "snowflake-arctic-embed:22m"
}' | jq .

It gives out a very long output with all the information about the pull (download) and ends with:

{
  "status": "pulling 2977e9705f4b",
  "digest": "sha256:2977e9705f4b122813b1aeb50fc0c6563113da0b626f540c3daa8149827e30d3",
  "total": 333,
  "completed": 333
}
{
  "status": "verifying sha256 digest"
}
{
  "status": "writing manifest"
}
{
  "status": "success"
}

Given that the model is downloaded, the easiest way to verify this is by running the ollama list command, and you will see the snowflake-arctic-embed:22m on the list, a 45 MB model. Using this model, you can use the /api/embed endpoint to generate the embeddings.

Show model information #

By calling the '/api/show ' endpoint with a POST call, you can view the model information, including details, model file, template, parameters, license, and system prompt. The model name is a required parameter. Passing the verbose optional parameter will return the full data with verbose fields in the response.

Below is an example call to the show model information endpoint without the verbose flag for the smollm2:135m model, some fields have been truncated for brevity:

{
  "license": "\nApache License\n Version 2.0, January 2004...the full apache license here",
  "modelfile": "# Modelfile generated by \"ollama show\"\n# To build a new Modelfile based on this, replace FROM with:\n# FROM smollm2:135m\n\nFROM /path/to/.ollama/models/blobs/sha256-f535f83ec568d040f88ddc04a199fa6da90923bbb41d4dcaed02caa924d6ef57\nTEMPLATE \"\"\"template and parameter info here\nLICENSE \"\"\"\nApache License\n    Version 2.0, January 2004...the full apache license here",
  "parameters": "stop\"<|im_start|>\"\nstop\"<|im_end|>\"",
  "template": "template info here",
  "system": "You are a helpful AI assistant named SmolLM, trained by Hugging Face",
  "details": {
    "parent_model": "/path/to/.ollama/models/blobs/sha256-f535f83ec568d040f88ddc04a199fa6da90923bbb41d4dcaed02caa924d6ef57",
    "format": "gguf",
    "family": "llama",
    "families": [
      "llama"
    ],
    "parameter_size": "134.52M",
    "quantization_level": "F16"
  },
  "model_info": {
    "general.architecture": "llama",
    "general.basename": "smollm2",
    "general.file_type": 1,
    "general.finetune": "8k-lc100k-mix1-ep2",
    "general.languages": [
      "en"
    ],
    "general.license": "apache-2.0",
    "general.organization": "HuggingFaceTB",
    "general.parameter_count": 134515008,
    "general.quantization_version": 2,
    "general.size_label": "135M",
    "general.type": "model",
    "llama.attention.head_count": 9,
    "llama.attention.head_count_kv": 3,
    "llama.attention.layer_norm_rms_epsilon": 0.00001,
    "llama.block_count": 30,
    "llama.context_length": 8192,
    "llama.embedding_length": 576,
    "llama.feed_forward_length": 1536,
    "llama.rope.dimension_count": 64,
    "llama.rope.freq_base": 100000,
    "llama.vocab_size": 49152,
    "tokenizer.ggml.add_bos_token": false,
    "tokenizer.ggml.add_space_prefix": false,
    "tokenizer.ggml.bos_token_id": 1,
    "tokenizer.ggml.eos_token_id": 2,
    "tokenizer.ggml.merges": null,
    "tokenizer.ggml.model": "gpt2",
    "tokenizer.ggml.padding_token_id": 2,
    "tokenizer.ggml.pre": "smollm",
    "tokenizer.ggml.token_type": null,
    "tokenizer.ggml.tokens": null,
    "tokenizer.ggml.unknown_token_id": 0
  },
  "modified_at": "2025-02-08T15:33:44.760304367+11:00"
}

Depending on the level of security needed for your Ollama instance, the show model API should not be accessible outside of the app.

Other Ollama APIs #

Other Ollama APIs can list running models, delete a model (you would not want someone to delete a pulled model), create a model from another model, copy a model, and even generate embeddings. You can explore them all in the official documents.

Suppose you have used Postman and its collections. You can also use this Postman collection that lists most of the Ollama API calls in a neat, easy to test collection. In the next part (part 4) of the Ollama blog series, you will learn how to run Ollama in Docker with Docker Compose.

Important caveat #

If you plan to host Ollma on a publicly accessible URL or with some form of authentication and authorization, please remember to expose only the generate (/api/generate) and the chat (/api/chat) endpoints. You will not want users to call the pull or even delete model API endpoints.

You can do it by putting a reverse proxy in front of Ollama’s Gin server (running at port 11434, by default). You can choose between Nginx reverse proxy or Caddy. With a reverse proxy, you can pass through only the traffic that comes to /api/generate and /api/chat forward to Ollama. As Ollama’s server is written in Go/Gin, you may even try that path to secure your APIs if you know how to write Go and Gin. There is an issue on the official Ollama GitHub repository about something similar if you want to follow that.

Conclusion #

In this post, you learned about some of the Ollama API endpoints, focusing on the ones that help you get a response from an open model. You learned about the Ollama API endpoints for pulling a model, listing models, and showing model information.

More importantly, you are aware of a crucial caveat: you should not expose all the available Ollama APIs to the outside world. If someone calls the delete model API endpoint, your Ollama API will stop functioning, so be careful. Keep learning!

Ollama commands: How to use Ollama in the command line [Part 2]

2025-02-06T11:45:37Z

Ollama is an open-source tool that helps you run open LLMs on your machine or a server. It is the glue layer between your machine (or hardware) and the open LLM of your choice. In this post, you will learn about the Ollama command you can use to get the most out of it; let’s get going!

Table of contents #

Quick recap #

This blog post is part 2 of the Ollama series. In the first part, you covered topics like what is Ollama, it’s features and how to run Ollama with examples of Smollm2 and DeepSeek R1 models.

In this part, you will learn about some useful Ollama commands like serve, run, and ps. Before diving deeper into code mode, please ensure you have Ollama installed and working in your system by reading part 1. Part 1 also covers the installation of Ollama and running the Smollm2 135 million parameter model and DeepSeek R1 8 billion parameter model.

The part 3 of this Ollama series covers the Ollama APIs, which are used by the CLI and can be used by other systems to interact with the LLMs.

In part 4, you will learn about running Ollama in Docker with Docker Compose. You will also add Open WebUI in a Docker container to interact with the LLMs running on Ollama with Docker Compose.

Ollama commands #

Ollama has multiple commands to achieve relative goals. To know the sub-commands you can run with Ollama, you can execute the following:

 ollama --help

It will give you the following output:

You can also run ollama --version to check the version of Ollama when writing the version of Ollama is 0.5.7. If you want help with a specific sub-command, you can add --help after the sub-command; for example: ollama run --help will give you the following output:

Now that you know the basics, let’s look at some useful Ollama commands.

Ollama serve #

Ollama serve is the main command that starts the Ollama server. It can be configured with many environment variables, such as OLLAMA_DEBUG to enable or disable debugging, OLLAMA_HOST to specify the server's host, and OLLAMA_MAX_QUEUE to configure the maximum number of queued requests. To learn more about these environment variables, run ollama serve --help.

Ollama runs Gin (written in Go) as the underlying server to add an API layer to the downloaded (pulled) models. Both the CLI and any other services that need to use LLM inference will use the server started with ollama serve, which will give an output similar to the below:

The Gin server runs on port 11434 by default, so if you hit http://localhost:11434/ on the browser of your choice (probably Chrome), you will see the text Ollama is running. The next part of this Ollama series will discuss the API in detail.

Given that the server is running, you will run a model next with ollama run.

Ollama run #

The Ollama run command runs an open model available in the Ollama models page. It will pull (download) the model to your machine and then run it, exposing it via the API started with ollama serve. Like the previous part, you will run the Smollm2 135 million parameter because it will run on most machines with even less memory (like 512 MB), as the model is 271 MB.

To run Smollm2 135M parameters model, you can execute:

ollama run smollm2:135m

It will result in something like the following:

If you ran the model for the first time, it would have downloaded and run, as seen in the last part of this Ollama series. However, running an already pulled (downloaded) model runs quickly the second time.

If you type /? within the run command, you will see the help. You can set variables for the model like num_ctx, which can be used to configure the context window of the model. For instance, you can type in /set parameter num_ctx 8129 to set the context window to 8129 tokens.

You can also try /show info, and it will show you the model’s information like:

>>> /show info
  Model
    architecture        llama      
    parameters          134.52M    
    context length      8192       
    embedding length    576        
    quantization        F16        

  Parameters
    stop    "<|im_start|>"    
    stop    "<|im_end|>"      

  System
    You are a helpful AI assistant named SmolLM, trained by Hugging Face    

  License
    Apache License               
    Version 2.0, January 2004

You can play with the other commands you run in the running model context. You can also chat with the model, asking it questions like what is the speed of light? gave me the following output with the smollm2 135M parameter model:

>>> what is the speed of light?
The speed of light in space is approximately 299,792,458 meters per second. This value is an approximation based on observations and calculations made using special relativity theory. While it's difficult to measure precisely with our current technology, scientists have used the latest methods to estimate this value for both magnitude (1) as well as relative motion (the speed of light squared). 

It's worth noting that even if we can't directly calculate the exact speed of light in a vacuum using modern instruments and techniques, scientists often rely on estimates like those mentioned above.

To exit the running model context, type /bye and return to the command line. Ollama run is a versatile command that executes prompts directly within the terminal, facilitating quick and efficient interactions with your models.

Ollama list #

The Ollma list command lists all the open models pulled (downloaded) from Ollama’s registry and saved to your machine. When I ran ollama list on my machine, I got the following output:

ollama list    
NAME                           ID                 SIZE      MODIFIED   
deepseek-r1:8b    28f8fd6cdc67      4.9 GB     4 days ago    
smollm2:135m      9077fe9d2ae1    270 MB    4 days ago

So, I have two models, smollm2:135m and deepseek-r1:8b, which are 270MB and 4.9 GB, respectively.

Ollama pull #

You can download other models from the Ollama registry on your machine using the ollama pull command. For instance, if you want to pull in Qwen 2.5 half a billion parameter model (398 MB), you can execute:

ollama pull qwen2.5:0.5b

That will result in something like:

ollama pull qwen2.5:0.5b

pulling manifest
pulling c5396e06af29... 100% ▕██████████████████████████▏ 397 MB
pulling 66b9ea09bd5b... 100% ▕██████████████████████████▏   68 B
pulling eb4402837c78... 100% ▕██████████████████████████▏ 1.5 KB
pulling 832dd9e00a68... 100% ▕██████████████████████████▏  11 KB
pulling 005f95c74751... 100% ▕██████████████████████████▏  490 B
verifying sha256 digest
writing manifest
success

It will take some minutes, depending on your internet speed. If you run ollama list after pulling the Qwen model, it will be listed too like below:

ollama list
NAME                   ID            SIZE      MODIFIED       
qwen2.5:0.5b      a8b0c5157701       397 MB    50 seconds ago    
deepseek-r1:8b    28f8fd6cdc67       4.9 GB    4 days ago        
smollm2:135m      9077fe9d2ae1       270 MB    4 days ago

Similarly, running the Qwen model now will run directly rather than downloading and running it after downloading. You can also look at the CLI tab running ollama serve to see all the API calls these commands make in the background. Ollama pull seamlessly downloads and integrates pre-trained models from the vast Ollama model library to be used on your machine.

Ollama ps #

Like other ps commands that list processes, the ollama ps command will list running models. For this, you will first need to run a model; you can run the Qwen2 0.5 B parameters model with ollama run qwen2.5:0.5b. Then, in a new CLI tab, you can run ollama ps, which will give an output similar to the following:

ollama ps
NAME            ID              SIZE      PROCESSOR    UNTIL              
qwen2.5:0.5b    a8b0c5157701    1.4 GB    100% GPU     4 minutes from now

To exit the run context, type /bye to return to the command line.

Ollama create #

With the ollama create command, you can create a new variant of an existing open model. For example, you will create a new variant of the smollm2:135m parameter model with a context window of 16K, and the temperature (creativeness) is set to 0.1, which is significantly less creative. To do this, first, you will create a Model file named Modelfile-smollm2-16k in your current folder with the following content:

FROM smollm2:135m

PARAMETER temperature 0.2
PARAMETER num_ctx 16384

Like Docker, you are saying to start from the smollm2:135m, set the temperature parameter to 0.2, and set the context with the num_ctx parameter to 16384.

Now, to create a new model named smollm2:135m-16k-ctx you will run the following command:

ollama create smollm2:135m-16k-ctx -f Modelfile-smollm2-16k

It will create a new variant of the Smollm2 135 million parameter model following the instructions in the model file. If you run ollama list, you will see the new model on the list. Then, to run the new model, you can execute ollama run smollm2:135m-16k-ctx as seen below:

In the running model context, where you can type /? for help, if you type in /show info you will see the following output:

/show info
  Model
    architecture        llama      
    parameters          134.52M    
    context length      8192       
    embedding length    576        
    quantization        F16        

  Parameters
    temperature    0.2               
    num_ctx        16384             
    stop           "<|im_start|>"    
    stop           "<|im_end|>"      

  System
    You are a helpful AI assistant named SmolLM, trained by Hugging Face    

  License
    Apache License               
    Version 2.0, January 2004

This means the two parameters specified in the Model file, temperature and context window with num_ctx, are applied to the model. Because the temperature is set to a low value 0.2, if you ask this model variant, why is the sky blue? give me 1 sentence answer. even 3 times consecutively; it will give you almost the same answer as seen below:

Next, you will learn about some other Ollama commands.

Other Ollama commands #

If you can pull a model, you can push a model to the Ollama registry. For this, you will need an Ollama account and API keys to share your model on Ollama.

Similarly, you can copy a model with ollama cp and remove a model with ollama rm followed by the model's name. You can also run ollama show <model-name> to see the configuration of the model; for example, ollama show smollm2:135m will show the following:

ollama show smollm2:135m 
  Model
    architecture        llama      
    parameters          134.52M    
    context length      8192       
    embedding length    576        
    quantization        F16        

  Parameters
    stop    "<|im_start|>"    
    stop    "<|im_end|>"      

  System
    You are a helpful AI assistant named SmolLM, trained by Hugging Face    

  License
    Apache License               
    Version 2.0, January 2004

The above output is the same as running /show info when the model runs within the CLI. As the new versions of Ollama are released, it may have new commands. To learn the list of Ollama commands, run ollama --help and find the available commands.

Ollama commands are similar to Docker commands, like pull, push, ps, rm. In the case of Docker, it works with Docker images or containers, and for Ollama, it works with open LLM models.

In the next part of this Ollama series, you will learn about the Ollama APIs. The CLI also uses these APIs; you will learn more about them so that another system can use them for LLM inference.

Conclusion #

In this comprehensive guide, you explored a wide range of essential Ollama commands, From ollama serve to ollama run, and from ollama pull to ollam create. By mastering these Ollama commands, you'll be well-equipped to harness the full potential of this powerful and easy-to-use framework, opening up a world of possibilities for your projects and applications. Whether you're a seasoned developer or just starting your journey into AI, Ollama and its commands will undoubtedly be invaluable assets in your toolkit. Keep learning!

What is Ollama and how to use it: a quick guide [part 1]

2025-02-02T11:47:52Z

The world of AI has been hyped for more than two years now since the release of ChatGPT in November 2022. New tools and technologies emerge daily, promising to revolutionize our work and lives. If you're looking to harness the power of large language models (LLMs) for personal projects or even professional applications, Ollama might be the key. In this post, you will learn what Ollama is, how to download, install, and use Ollama to run a small model, and then run DeepSeek R1 (the other super popular LLM); let’s get started.

Table of contents #

What is Ollama #

Ollama is an open-source tool mainly written in Go lang (89%) that runs open LLMs on your local machine (or a server). It acts like a bridge between any open LLM and your machine, not only running them but also providing an API layer on top of them so that another application or service can use them.

Ollama is a user-friendly and powerful software for running LLMs locally. It hides the complexities of LLMs, packaging them to be accessible and easily customizable with a model file. There are alternatives to Ollama, like vllm and aphrodite, but Ollama is surely the most popular one. Ollama provides a clean, user-friendly interface that allows you to interact directly with LLMs, tailoring the experience to your needs.

Ollama blog post series #

This post is the first part of a series of posts on Ollama. In this series, you will learn about Ollama, its features, how to install and run it on your local machine, and how to use it with different models.

The part 2 of this series will cover Ollama commands. In this part, you will learn about the various commands you can use with Ollama to interact with LLMs. You will learn how to list models, run models, pull models, and more.

Similarly in part 3, you will learn about Ollama APIs, which are used by the CLI and can be used by other systems to interact with the LLMs mainly for generating responses out of an open LLM.

In part 4, you will learn how to run Ollama with Docker Compose. Docker is a popular containerization tool that allows you to run applications in a container. You will also add Open WebUI to have a web interface on top of LLMs running on Ollama.

Ollama features #

Below are some important features of Ollama:

Privacy and offline access #

One of the most important features of Ollama is privacy and offline access. You can run open models privately on your machine, even without internet access. This not only enables you to use an LLM (say, for code suggestions) on a plane but also keeps your data on your local machine. Your data and files can stay safe in your local machine, and other big tech companies do not see it or get to use it for other purposes like training an LLM. This is a big advantage of Ollama over other cloud-based LLM services which send your data to the cloud for processing and may use it for other purposes.

Model management #

Adding a new model to the library of local models is easy. You can pull a model with ollama pull command and run it. At the time of writing, there are 150+ models you can pull and run locally, from DeepSeek R1 to Smollm, you can run popular models like Llama 3, Phi 4, Gemma 2, Mistral to code-specific models like Codeqwen, and Codestral. There is no need to navigate any complex format or dependencies or install any other software on your machine. You can pull a model and run it; if your system's resources support it, it will run easily.

Seamless installation #

As you will learn in the next section, Ollama's installation process is very user-friendly. Regardless of your Windows, Linux, or Mac operating system, you essentially run a command, and the Ollama CLI is available on your local machine. Thus, Ollama surely boasts a smooth and hassle-free installation and setup experience. With one command you can istall ollma CLI on a Linux or Mac operating system.

Local API #

Ollama has various commands, including the ollama serve command. This command starts a Gin server expoing an API over all the LLM models available on your local system. This allows you to integrate LLMs into other applications and workflows. An API enables efficient communication between your application and LLMs. You can send prompts, get responses back, and exploit the full potential of LLMs. Using structured outputs, you can even get the response from the LLMs in a predefined schema.

Customization #

With Ollama you can tweak parameters and extract the most value out of LLMs on your local machine. For instance you can change the num_ctx parameter and change the context window size for a model with Ollama. You can also change the configs to fine-tune LLM parameters, adjust settings and change the model behaviour to better suits your needs. You can also use a modelfile to set multiple parameters like temperature, seed, top K, top P and others.

Hardware acceleration #

Figuring out the resource and computational needs of the LLMs, Ollama can leverage the available hardware resources on your system including GPUs and CPUs. To check this you can run a ollma ps command and see the resource usages. Ollama makes sure the resources in your machine are utilized efficiently which enables you to run even large LLMs easily.

Adding a user interface #

Ollama is a command-line interface (CLI) aimed at advanced users. To use a graphical user interface with a chat, you can download and use open web ui (previously known as Ollama Web UI). Open web UI is also open-source. It claims to be an extensible, feature-rich, and user-friendly self-hosted AI platform designed to operate entirely offline. You can read the official docs to get started with open web UI which plays along well with Ollama.

In the next seciton, you will learn how to install and use Ollama.

How to run Ollama locally #

Ollama is easy to install and use on any operating system, be it Windows, Linux or Mac. For this guide, you will install it on a Mac and run the smollom2 model with 135 million parameters which is 271 MB. The smollom 135 M parameters is 92 MB which is one of the smallest but still useful model.

To install Ollama on the operating system of your choice, you can navigate to the official Ollama download page. It looks as follow:

As I am using a Mac I can download the installer from that page or run the following command to get Ollama CLI installed on a Mac:

brew install ollama

It will install Ollama and all its related dependences and end like the below if all goes well:

If you installed it with brew you will need to run ollama serve. You will see the ollama variables and resources.
Then you can run ollama –version in a different CLI tab. You will see something like the bleow:

 ➜  ollama --version
ollama version is 0.5.7

At the time of writing the latest version of Ollama is 0.5.7. As this point, because ollama is just installed you will not have any models. To check that you can run ollama list which will show an empty list.

Now, to install and run the smollom2 135 million parameter model you will execute the following on your command line:

ollama run smollm2:135m

Depending on your internet speed to download the 271MB model, it will take some time and show your the following output, where you can type your first question/promot to the model like
why is the sky blue? give the shortest possible answer:

You can play around and send it prompts or questions like who are you?. It will give the answers. If you write /? you will see the help menu as follows:

You can type in /bye to stop the running model and get out of its prompt. Congrats! You have successfully run a relatively smaller LLM (pun intended) on your machine. As it is a smaller model with only 135 million parameters (271 MB) it might not perform many tasks well. So in the next section you will run the popular DeepSeek R1 8 billion parameter model.

Run DeepSeek R1 with Ollama #

To run DeepSeek R1 8 billion parameter model you can run the following command:

ollama run deepseek-r1:8b

Keep in mind, you are downloading almost 4.9 GB of data, so depending on your internet speed it might take minutes (or hours, I can’t say). With Ollama you can try many open models, have a look at their models page and you can choose which model you would like to run on your local machine. Make sure that you have enough resources, memory, CPU and disk space to run a really large model. For iinstance the DeepSeek R1 671B parameters model is 404 GB which will surely not run a consumer grade laptop.

In my case, after a few minutes of watiing the deepseek-r1:8b model was downloaded and it ran. You can run the same question/prompt of why is the sky blue? give the shortest possible answer and see the response from the model as seen below:

As you can see above, DeepSeek R1 (R for reasoning) apparently “thinks” and then gives back a response. Thats is something new. To stop the model and exit the chat you can type /bye and then get back to the bash.

Where does Ollama store the models #

Depending on your operationg system and how you installed Ollama it may vary. In my case, on a Mac it was stored in the user directory at ~/.ollama/models as you can see below with the command ls -al ~/.ollama/models:

Ollama store the models at ~/.ollama/models on Linux machines as well. It can be changed by setting an environment variable called OLLAMA_MODELS. In the next part of this series, you will learn about Ollama commands.

Conclusion #

Ollama is a game-changer for anyone working with LLMs. It simplifies the often daunting complexities of LLM interactions, making this powerful technology accessible to a much broader audience. Ollama's intuitive interface and user-friendly design make it the perfect tool for maximizing the power of LLMs and effortlessly incorporating them into your workflow.

In this post you learned how to install Ollama and run Smollom2 and DeepSeek R1 models on it using the command line. You also found out where Ollama stores the downloaded models. Happy AI exploration!

How to use Gemini over Vertex AI to summarize and categorize job listings with controlled generation

2025-01-26T11:34:52Z

LLMs generally reply in a nondeterministic format; it does not always comply with the formatting instructions given. This is where controlled generation (structured output) comes into play, where you ask an LLM to reply to comply with a given schema. In this post, you will learn how to use Gemini over Vertex AI and controlled generation to get structured output that follows a schema on job listings to summarize and categorize them. Let's get going!

Table of contents #

The solution to enhance #

This is a real-life scenario where one of our side projects - AU Tech Jobs which aggregates jobs from multiple (like 60+) company job listing pages. If interested, you can read the story of ATJ. Currently, the job details page has two useful features on the job detail page, built by calling two different APIs.

The first one is if the “computer percent” for a particular job is less than a given threshold, it shows a message Our machine learning algorithm suggests this might not be a pure tech job to let the user know it might not be a tech job, for example, an “Account Executive” role is not a tech or software engineering/tech job. Currently, this classification is done using the uclassify API. It does an “ok” jo,b but sometimes does not give back a good computer_percent classification.

The second feature is a summary of the job description. Currently, this feature is done using Bert Executive summarizer – you can try it out. BERT by Google is like the previous version before the transformer models. It does the summarization task but is not as versatile as a Large Language Model (LLM). We call another API to get the summary of the job description.

Both the features look as follows in action:

The enhancement we want to make is to get both the summary and the categorization (tech percentage, in this case) using an LLM and a prompt in a single call. If it is an LLM call, other features could be easily added. The next section will see how this can be done with a working proof of concept.

Summairze and classify using Gemini over Vertex AI #

Now, if we were to modernize the summary generation and computer of software engineering percent of the given job description with the current powerful LLMs, it can be done with a single call (or prompt). Let’s look at how you can do a proof of concept on Vertext AI using Gemini 2.0 Flash.

To do this, you will need a Google Cloud Account (with some credits on it), then you can follow along next:

Vertex AI Freeform #

You can start by creating a new GCP project or using an existing one. On your selected project, search for vertex and click on Vertex AI as seen below:

On the Vertex AI page, click on Freeform below Vertex AI Studio on the left menu as follows:

If it is a new project or you are using Vertext AI for the first time, you will need to enable the Vertex AI API by clicking Agree & Continue as shown below:

At this point, you should have landed on the Vertex AI Freeform page as seen here:

Next, you will write the prompt summarizing and categorizing a job description.

The prompt #

The next task is putting in the prompt and a job description to reach our goal. For this, I selected a random software engineer job from Atlassian and used the following prompt:

From the job description, first summarize it to less than 125 words,
then determine whether you think it is a software engineering job and
your confidence percentage.

The prompt and the job description combined are below:

From the job description, first summarize it to less than 125 words,
then determine whether you think it is a software engineering job and 
your confidence percentage.

Backend Software Engineer

Working at Atlassian
Atlassians can choose where they work – whether in an office, from home, 
or a combination of the two. That way, Atlassians have more control over 
supporting their family, personal goals, and other priorities. We can hire 
people in any country where we have a legal entity. Interviews and 
onboarding are conducted virtually, a part of being a distributed-first 
company.

With a sufficient timezone overlap with the team, we're able to hire eligible 
candidates for this role from any location in Australia and New Zealand. 
If this sparks your interest, apply today and chat with our friendly
Recruitment team further.

Atlassian is looking for talented Developers to join one of our 
Sydney engineering teams (i.e. Jira Server, Jira Cloud, Growth, etc.) 
Atlassian's engineering team is responsible for shaping the future by helping 
thousands of teams all around the world get work done. 

As a Developer well into your career, we know you're exceptional 
at what you do, but you're still eager to learn and hone in on skills 
as a developer... That's why we're placing a heavy emphasis on leaning 
on your expertise in one or more tech stacks but still learning and 
growing. We don't expect you to be an expert, but we'll sure make 
sure you get on the right path towards becoming one...

Wait, I don't have Java experience and you still want to interview me? 
That's right! At Atlassian, we hire engineers that can demonstrate 
their ability to learn and keep up with new and emerging technologies. 
It's true that Atlassian's stack is primarily written in Java and in 
the role you'll be coding primarily in Java, but we do believe in using 
the right tools for the job rather than being tied to the tool (e.g. Java).
We happen to have a variety of languages within our stack including 
Kotlin, Python, and Ruby.For the interview process, we want to 
see you at your best. This means that during the interview, we want 
you to code in whatever language you feel you're best at. This will 
give us a sense of your skills as a developer, which is all we need to make 
a proper assessment for this role.

In this role, you'll get the chance to:
Drive projects independently, from technical design to launch
Apply architectural standards and start using them on new projects
Contribute to code reviews & documentation as well as take on complex bug fixes
Begin writing useful technical documentation - Learn and code in Java
Mentor more junior members
Sound like an exciting opportunity? We think so too... In order to 
set you up for impact on day one, we'll expect you to have 
this on your first day:

You will not be required to know a specific programming language, 
however experience with a prominent language such as Java, 
Python, C#, C/C++, or Ruby is crucial.

Deep understanding of data structures, in particular, how they 
are implemented and how to apply them to solve problems

Passion for collaborating, tackling hard problems and not being 
afraid to ask questions
A real appetite for learning and growing, both as a developer 
and teammate.

Our perks & benefits
Atlassian offers a variety of perks and benefits to support you, 
your family and to help you engage with your local community. 
Our offerings include health coverage, paid volunteer days, 
wellness resources, and so much more. 

Visit go.atlassian.com/perksandbenefits to learn more.

About Atlassian

At Atlassian, we're motivated by a common goal: to unleash 
the potential of every team. Our software products help teams 
all over the planet and our solutions are designed for 
all types of work. Team collaboration through our tools 
makes what may be impossible alone, possible together.

We believe that the unique contributions of all Atlassians 
create our success. To ensure that our products and culture 
continue to incorporate everyone's perspectives and 
experience, we never discriminate based on race, 
religion, national origin, gender identity or expression, 
sexual orientation, age, or marital, veteran, or 
disability status. All your information will be kept 
confidential according to EEO guidelines.

To provide you the best experience, we can support with 
accommodations or adjustments at any stage of the 
recruitment process. Simply inform our Recruitment team 
during your conversation with them.

When pasted on the Prompt box of Vertex AI freeform, it looks like the below:

You can generate a response now, but it will be a bit random. That is where setting the configs better and using the controlled generation with a schema will improve the output. Next, you will tweak the settings to make the output more predictable.

Using better settings #

You will change the settings for a more optimized output for the summarization and categorization task. You can set the Temperature at 0.2 and the Output token limit at 4096. The Temperature is the creativity or randomness you want the LLM to have in the output, and the Output token limit is the output length where roughly one token is one word.

As you want the LLM to be more predictable, the Temperature is set to a low 0.2. You can even set the Output token limit to 512, and it would work, but you are setting it higher just in case the LLM sends out long output. Your setting will look like the following:

You can leave the model as gemini-2.0-flash-exp, as seen in the above image. Next, you will set the schema for the controlled generation, enabling structured and more predictable output.

Schema for controlled generation #

To set the schema for controlled generation with Gemini, you will need to change the Output format to JSON on the right panel below Grouding settings as you can see below:

After that, you will click Edit beside the select box where JSON is already selected for the Output format. It will slide in a new overlay from the right side, and there you will paste in the schema below:

{
  "type": "object",
  "properties": {
    "summary": {
      "type": "string"
    },
    "tech_percent": {
      "type": "number"
    }
  },
  "required": [
    "summary",
    "tech_percent"
  ]
}

After that, you can click apply as shown below:

Before proceeding further, let’s analyze what the schema means:

First, you have put a schema of a JSON object (not an array of objects or anything else), the object has two properties, which are:

summary: it is of type string
tech_percent: having the type number

Then, you specify that both properties are required in the output by adding both fields to the required array. There are other types supported fields or types from the Vertex AI schema you can use. For example, you can use an Enum with only two values, positive or negative if you analyze sentiment.

Similarly, you can send in an array of items and expect back an array of items in a given schema like this weather forecast example. The possibilities are endless, if you use Gemini’s multi-modal capabilities you can even use a schema to list out the identified objects in an image. It would be advisable to thoroughly read that official document. You can also use Google AI Studio for a visual editor for the structured output schema.

Next, we will test the output on the Vertex AI interface.

Test it #

To test the output and if it adheres to the defined schema, you can hit the play or generate button and check the output as follows:

As you can see, it works well, and the output adheres to the given schema. The output I got was the following:

{
  "summary": "Atlassian is hiring a Backend Software Engineer to join their Sydney team. The role involves driving projects, applying architectural standards, contributing to code reviews, and mentoring junior members. While Java is the primary language, experience with other languages is valued. They emphasize learning and growth, and want to see candidates code in their preferred language during interviews. The company offers flexible work arrangements and various benefits. They value diversity and inclusion.",
  "tech_percent": 95.0
}

To ensure it works fine with non-tech jobs, I tested it with a Sales jobs description and ran the generation. It rightly guessed that with was only 10% of technical/software engineering related:

You can switch back to the previous job description of a Backend Software Engineer and proceed. If you try it multiple times you might the quota error:

Error message: 'Online prediction request quota exceeded for gemini-experimental. Please try again later with backoff.'

Status: 429 Error code: 8

You can use a different model like the gemini-1.5-pro-002 to overcome this error. You can also ask for an increase in the Vertex AI quota.

Generate code #

To generate a code for what you have done, click the Get Code link on the right sidebar beside the Save button:

It will show a working Python code as follows:

Click ' Close ' to close the overlay. When the call is finished, you can analyze the code's response_schema parameter, which will have the schema you defined in the previous step.

You can copy and run the generated Python code on Google Cloud Shell, editing the file Cloud Shell editor, there is an example of that in this tutorial. You can also run it as a Google Collab Notebook by clicking the Open Notebook button.

As this tutorial is focused on controlled generation, we will not explore running the code further. However, you can add other layers to the generated code as needed. For example, you can create an API with FastAPI or have a running app with a helpful UI using Streamlit or Google’s Mesop.

Coming back to the use case, I would have directly called the Gemini API for each or multiple job posts in the single call as Gemini 2.0 flash exp has a 1 million tokens context window and got the summary in batches of 100 or 200 jobs. In the above example, each job took 900-1000 tokens, which would work well. I would also consider the cost associated with it.

Nothing will change visually for the user; however, this enhancement will significantly improve the output quality.

You can also try the official controlled generation tutorial over Google Collab.

This blog post has been written as part of #VertexAISprint, and Google Cloud credits are provided for this project.

Controlled generation is useful #

Controlled generation is helpful as the LLM will reply in JSON, which is much better for machines interacting with LLMs. Given the LLM will adhere to the schema, it makes it much easier as the output is structured, and with the required fields, you can expect the fields to to be there in the JSON output.

Also, the output does not sway in other directions; as you saw in the above example, the tech_precent value was rightly guessed at 95% for the backend software engineer, and it came down to 10% for a sales role’s job description.

It will not reduce the hallucination or the non-deterministic nature of LLM’s output, but it can add the needed structure to the output, making it much easier for machines or API clients to read it.

Conclusion #

You saw a real-life example of how Gemini can replace older methods of summarizing and categorizing. Staring with a use case, you built a compelling and working proof of concept using Gemini over Vertex AI. You wrote a good prompt, tweaked the optimal output configs, and used a schema object with two required fields for controlled generation and structured output. You also learned why and how controlled generation is helpful. I hope you learned something new from this guide and continue learning more about LLMs and Gen AI. Keep learning!

How to run (any) open LLM with Ollama on Google Cloud Run [Step-by-step]

2025-01-20T11:34:52Z

Ollama is a great way to run many open Large Language Models (LLMs). You can run Google Gemma 2, Phi 4, Mistral, and Llama 3 on your machine or the cloud with Ollama. You can also host these open LLMs as APIs using Ollama. In this post, you will learn how to host Gemma 2 (2b) with Ollama 0.5.x on Google Cloud Run; let’s get started!

Table of contents #

Why Google Cloud Run
Create a GCS bucket
Deploy Ollama on Google Cloud Run
- Wire up GSC bucket as a Cloud Run Volume
Testing Gemma 2 with Ollama on Google Cloud Console
Conclusion[#conclusion]

Why Google Cloud Run #

Good question; I have written multiple blog posts about Google Cloud Run and also given a couple of talks in the past years, Some great reasons to use Google Cloud Run to host open LLMs with serverless containers are:

The resources (like CPU, memory, and even GPU) are allocated in a serverless way. Meaning you only pay for the time you use it.
You don’t need to send data out of your VPC, putting security first
More cost control without counting tokens, as the LLMs are self-hosted; you can define how the resources are allocated rather than just counting the number of tokens sent and received.

Now that that's out of the way let’s access the Google Cloud Console and deploy Gemma 2 (2b—2 billion parameters) on Google Cloud Run.

Create a GCS bucket #

First, you will need an existing Project on Google Cloud, or you can create a new project. For this tutorial, we will assume that you have a new(ish) project. Since the project has already been selected, you will create a new Google Cloud Storage (GCS) bucket. You are creating a GCS bucket to store the files needed for the open LLM, which is Gemma2 2b in the case of this guide.

To create a new bucket, search for bucket on the search bar:

Then select the Buckets option, as shown above. Then click the + Create option on the “Buckets” page:

After that, name the bucket something unique, like ollama-gemma2-2b-xyz.. All buckets across GCP have unique names, so you might need a suffix. Then click the down arrow beside Optimize storage for data-intensive workloads and check the Enable Hierarchical namespace on this bucket checkbox as shown below:

This will help optimize the LLM access later. After that, select the bucket to be a single region and the Region as us-central1 (Iowa) as follows:

Then click Continue. Next, keep the Storage Class as Standard and then click Continue:

After that, let the access control be on the bucket level (not Fine-grained ) and click Continue:

Then, keep the data protection policy as shown below, and click Create to create the bucket.

Next, it will ask you to confirm the access as below:

Click Confirm. It will take some time, but the bucket will be created, which will look like the below:

After the bucket is created, the next task is to deploy Ollama on Google Cloud Run.

Deploy Ollama on Google Cloud Run #

To deploy Ollama on Google Cloud Run, search for cloud run on the search bar:

Then click Cloud Run to go to the Cloud Run page. On that page, click on Deploy Container and then click on Service as shown below:

You will do all the important configurations on this page, so be careful. You will need to fill up the form as follows:

In the Container image URL type in ollama/ollama:0.5.7 - at the time of writing, 0.5.7 is the latest release and available as an image on DockerHub
In the service name, type in ollama-gcs
Make sure the region is the same as the bucket, which is us-central1
For now, select Allow unauthenticated invocations. This will make it accessible to anyone on the web, but we are doing it for the sake of this demo. In a real-life scenario, you would put it behind authentication.

Till now, the form will look like the below:

Then, for billing select Instance-based and keep the Minimum number of instances to 0. This makes it serverless. When no requests are coming in, no instances will be up and running, saving you money. After that, select the Ingerss to be All so that it allows traffic from the internet. At this point, your form will look as follows:

Next, you will link the GCS bucket as a Cloud Run volume.

Wire up GSC bucket as a Cloud Run Volume #

This is an important part where you will link up the Google Cloud Storage (GCS) bucket created in the previous part as a volume for Google Cloud Run Containers. Click the volumes tab on the Container(s), Volumes, Networking, Security part:

Then click Add Volume and select the Volume type as Cloud Storage Bucket. Let the name be gcs-1, and then for the Bucket click Browse and select the bucket you created in the previous step, which will be named something like olllama-gemma2-2b-xyz. Then click Select, at this point, the form will look like the below:

Leave the Read-only checkbox unchecked, as the Cloud Run instances will write files to this bucket. Then click Done. It will say the bucket is Not mounted , which is fine.

After that, click the Go to container(s) tab or the Container(s) tab; on this tab, click the Volume mounts sub-tab, then click Mount Volume. Next, select the Name-1 as gcs-1 and on the Mount path 1 type in /root/.ollama (don’t miss the . in front of ollama); this is where Ollama stores its models. So when the models are pulled (downloaded), they will be saved in this volume, which will also be saved in the bucket. It can be used in other instances as it is in the bucket.

Then click Done. You will set some environment variables for the container next. Click Container: ollama-1 under the Containers tab to do this. Then click the Variables and Secrets sub-tab; after that, click Add Variable, and fill up the following in Name 1 and Value 1:

OLLAMA_HOST with value 0.0.0.0:8080 – this will make Ollama run on port 8080, not the default port of 11434

Similarly, add three more variables using the Add Variable button and fill up the following values:

OLLAMA_DEBUG with value false – this is self-explanatory
OLLAMA_KEEP_ALIVE with value -1 – it keeps the model weight on the GPU (if GPU is used)
GIN_MODE with value release – is to remove any Go Gin debug-related message. Ollama uses Gin under the hood,

Your Variables and Secrets section will look like the below when you are done:

After that, click the Settings tab and set the Memory to 32 GiB and the CPU to 8. You can request GPU access for your project by clicking the GPU quota link and filling out a form. Gemma 2 on Ollama will run (a bit slower, though) without the GPU.

Your Setting section will look like the above when you are done editing it; after that, you can click Done. If you cannot allocate 32 GB of memory and 8 CPUs it might be because your account is new; you can reqeust a quota increase. Even with 512 MB of memory and 1 CPU, which you should have without a quota increase, you can run the smollm2:135m at 135 million parameters; the model is 271 MB, which will fit in the 512 MB allocated memory.

Then, move on to the Requests section. Here, you can set the Request timeout to 300 seconds (5 minutes), the default value, and keep the
Maximum concurrent requests per instance at 80. Keep the Minimum number of instances as 0; the only value you should change will be Maximum number of instances; keep it at 3 or 4 maximum. If someone attacks your service, it should timeout or send back a server error, then scale a lot, costing you loads of money. Your Cloud Run service creation form will look like the below:

Then click Create and wait for some time as the Ollama image is 1.5 GB, it will take a bit to start. It will look like the following when it is deploying:

It will look like the below after it is deployed successfully:

Click the service URL to see if it is working:

It will show Ollama is running as above if everything went fine. At this point, Ollama has no models to run any inference. So, in the next section, you will pull in and test Gemma 2:2b with Ollama using the Google Cloud Console. Gemma 2 will be the first model for this instance of Ollama.

Testing Gemma 2 with Ollama on Google Cloud Console #

To test Gemma2 (or any other model that can run on Ollama), go back to the Google Cloud Console on the Cloud Run service page and click the Cloud Shell button (or hit G and then S on your keyboard). This will open the Google Cloud Shell terminal.

On the terminal, type curl -fsSL https://ollama.com/install.sh | sh to install Ollama; it has been taken from the Ollama Linux installation page. Let it execute, and it will show an output like the one below:

Then, copy the URL of your service, which will be something like https://ollama-gcs-<some-number-here>.us-central1.run.app. You can use the copy icon beside the URL. After that, execute the following command on your terminal:

OLLAMA_HOST=<copied-url> ollama run gemma2:2b

It will download (pull) Gemma 2:2b and save it in the GCS bucket (a linked volume), and then you can chat with Gemma 2:2b. You can ask who are you and Gemma will reply:

You can download/pull any other model Ollama supports and start using it. For example, you can download llama3.3:70b by Meta, phi4:14b by Microsoft, deepseek-r1:8b by DeepSeek (which is getting very popular), or even smollam2:135m, which is only 271 MB in size compared to other models, which are GBs in size.

You can type /bye to get out of the ollama CLI. Now, as Gemma 2:2b is downloaded, you can also send a cURL command to test it out like the one below:

curl -i https://ollama-gcs-<some-number-here>.us-central1.run.app/api/generate -d '{
  "model": "gemma2:2b",
  "prompt": "Why is the sky blue? Give the shortest answer possible",
  "stream": false
}'

It will give an output as follows:

If you go in the bucket and look at its contents, you will find Gemma 2 there:

Google Cloud Run makes it easy to run any LLM on Ollama. You can run Phi 4, Llama 3, or any other model; you must pull it and run your command or POST with curl. You can also use libraries like LiteLLM to use the model in your apps using Ollama’s APIs. Please explore Ollama more on your own. You can also use Open WebUI to have a GUI on top of Oallma running Gemma 2 LLM.

Conclusion #

It is easy to run any LLM with Ollama on Google Cloud Run. Be careful of the access as Ollama APIs allow pulling models and even deleting them. With Cloud Run, you will only pay for the resources when you use it, which makes it ideal for experiments. I hope you learned something new from this post and keep experimenting.

Recap 2024: Public Speaking, blogging, interviews, tech community work and other things

2024-12-20T10:27:45Z

I started writing the year-in-review (recap) posts from 2019, so this will be the sixth consecutive year I will write a yearly recap. Taking some time to reflect on things accomplished this year from a professional point of view, I think it will be a good rear mirror view to look back at 2024; let’s dive in!

A motorcycle rear view mirror background image generated using Image FX (Imagen 3)

Table of contents #

Highlights #

Below are the major highlights of 2024 from a professional (and tech) lens:

I did seven public talks/workshops this year, all in person and one technical talk at work. In addition to Sydney, I gave talks in Adelaide, Hobart, and Auckland, New Zealand too.
Similar to the past couple of years, I have written 25+1 (this recap) blog posts this year. In this blog, the number of users has decreased by 45% compared to 2023, thanks to us technologists using (or dare I say overusing) LLMs. I have also written one blog post about Feature flags for the Simply Wall St. tech blog.
I was a guest in two podcasts this year, one for the StackOverflow Podcast released in August and another one for The Doers Nepal (in Nepali language) which released in September.
Helped re-establish GDG Perth and contributed to organizing 11 meetups for GDG Cloud Sydney and two conferences: Google I/O Extended Sydney and DevFest Sydney this year. I also moderated two AI-related panel discussions at both conferences mentioned above.
I also helped groom five new female tech speakers, most of them “debuted” with their talks at DevFest Sydney.
My Docker Captain application was approved towards the end of June this year, and I became a Docker Captain
I have listened to 225 hours of podcasts this year (9 days and 9 hours), equating to almost 37 minutes daily (225 hours /365 days). I will have proof later. The top three are Changelog (master feed), Stack Overflow Podcast, and PodRocket.

This year, I traveled to the US and Nepal (after 6.5 years), which was a good experience. Let’s discuss some of the details from the above highlights next.

Public speaking in 2024 #

I wrote one talk about Serverless Conatiners and one workshop titled How to build an E-commerce product description generator using Gemini. I gave the talk at two meetups and also at Adelaide and Auckland. I did the workshop in Sydney for Build with AI Sydney 2024 at Google Sydney office and for DevFest Hobart. I also did a short talk with title How to nurture sustainable tech communities: experiences since 2006 in February for GDG ANZ Summit 2024. In total, I did seven public talks/workshops this year most of them in Sydney.

In addition to that, I did an internal talk at Simply Wall St. about the portfolio limits feature we deployed in April of this year. So, I wrote three talks and one workshop in 2024. You can view a reverse cronological list of all the talk I have done in public in my public speaking repo.

Blog posts of 2024 #

Similar to 2022 and 2023, this year, I wrote 25+1 (this review) blog posts. I wrote two blog posts in most months, but in March and June, I wrote only one. I wrote three blog posts in February, April, July, and December to balance it. The top 5 blog posts published in 2024 are:

Most viewed blog posts of 2024 #

Most of the blog traffic came from SEO (Google). Some numbers indicate over a million page views this year, less than in 2023, but it is a fantastic feat. Posts about Docker do well on my blog, and this year is the first time I have written about Nest.js, which I have used at work since 2019. I should write more about Nest.js in 2025.

I plan to scale back blogging in 2025 to one monthly post and target 14 or 15 blog posts. The number of users and page views has significantly decreased this year compared to 2022 or 2023.

Regarding ranking, this blog moved from the 882178th website in the world in 2023 to the 590709th position in 2024 on SimilarWeb as seen below:

This might mean the world’s search traffic and internet traffic to the blog decreased in 2024 due to LLMs answering most of the users' questions.

Podcasts/Interviews this year #

I was fortunate enough to be a guest in two podcasts. First, the popular StackOverflow Podcast titled From PHP to JavaScript to Kubernetes: how one backend engineer evolved over time. I had emailed them and got a reply back, had a quick pre-recording chat with Ryan. Then, the recording happened on an early Saturday morning (late Friday afternoon in the New York time zone); after some weeks, it was published and has had thousands of listeners till now. You can also listen to our conversation below:

I was in Nepal after a long time this year in August and had the pleasure of having an insightful (and long) conversation with Anup Ghimire for The Doers Nepal podcast. The podcast episode has the title Debugging Your Career: Advice from a Senior Software Engineer and has garnered almost 12K views on YouTube now. You can enjoy our conversation in the Nepali language here:

Those were the two interviews I did this year. I also did an online panel discusison for "IT Carrer in Australia" about Frontend vs Backend vs Full Stack: Navigating Your Career Path - hosted by Yana Martens in May. The other panelists in that discussion were Phillip Johnson and Nicholas Kircher. You can watch the panel discussion below:

I have moderated a couple of panel discusisons in 2024, which is discussed in the community activities section below.

Community activities #

As GDG Cloud Sydney's organizers, we organized 11 meetups in 2024. I attended ten and missed the one in August. I also spoke at one of our meetups about serverless containers.

In collaboration with GDG Sydney, we organized two full-day conferences, Googie I/O Extended Sydney 2024 in June and GDG DevFest Sydney in October. Both of these conferences had over 150 attendees. Google I/O Extended Sydney 2024 was a single-track conference, whereas GDG DevFest Sydney 2024 was a two-track conference. I mostly looked into the content for both of these conferences. That meant evaluating and choosing talk, finding panelists and mentors, and grooming new speakers. I also moderated the panel discussion about AI in both the above-mentioned conferences.

For grooming new speakers, I helped five (or more) new female speakers give their debut conference talk at GDG DevFest Sydney 2024. We helped them with content, delivery, and a dry run to foster more confidence in them when doing the actual show. Anima, Lakshmi, Monika, Ranjana, and Utkrista did their first conference talk at GDG DevFest Sydney 2024. Among them, Monika was invited to Melbourne to do the same talk, which gives me some extra pride 🙂.

This year, I helped some people become Google Developer Experts, Google Cloud Innovator Champions, and Women Techmakers Ambassadors. Thanks to Matt for facilitating this journey for those six lucky individuals. I also recruited two new co-organizers for GDG Cloud Sydney this year.

A big thanks to Reeya for her superb support for our events, from designing graphics, recording, and editing reels/shorts, and managing some logistics to being an amazing MC at DevFest Sydney.

Became a Docker Captain #

In addition to being a Google Developer Expert (GDE) for Google Cloud since 2019, I wanted to be tagged with another Developer Ambassador program. I emailed Docker to apply for being a Docker Captain in 2021 and 2023 but never heard back. This year, they added a form for being a Docker Captain, and thankfully, Eva replied when I filled out the form. She organized a short interview (chat) and approved my application, and I became a Docker Captain in late June of this year. When writing this, there are only 3 Docker captains in Australia.

Developer ambassador programs repo #

A developer ambassador program is a program a company runs to promote its products or services by that organization's representatives (but not employees). As a member of any developer ambassador program, you are expected to produce technical content like blog posts, podcasts, videos, etc. You are encouraged to talk about their products via public speaking or even contribute to open-source projects and forums like Stack Overflow. It comes with some benefits.

I have started an open-source repository to list the developer ambassador programs I know about. There are 40+ developer ambassador programs listed in the repo; eight are students only programs. Some popular ones are AWS Community Builder, GitHub Campus Expert, Google Cloud Champaign Innovator, and Microsoft MVP.

If you know of any developer ambassador programs missing from that list, please add them by opening a new pull request.

Listening to Podcasts #

With a daily average of ~37 minutes, I have listened to two hundred and twenty-five hours of podcasts this year (9 days and 9 hours). The stats from Podcast Republic, my podcatcher of choice, is as follows:

The five most listened-to podcasts in 2024 are:

ChangeLog Master feed - includes other podcasts like Ship It, Practical AI, etc. You must listen to Changelog News for your weekly Dose of software engineering news in under 10 mins.
The Stack Overflow Podcast - their relatively short (<30 mins) podcasts are a great listen.
PodRocket - great weekly episodes on frontend web development by LogRocket.
Investoploy: Stuart offers fantastic advice on investing and building wealth with an Australian perspective.
Rework - by 37Signals, it is always great to listen to Jason Fried, David Heinemeier Hansson, and the host Kimberly about the way 37Signals work.

Other podcasts in the top 10 include In Depth, Compressed.fm, Software Engineering Daily and ModernCTO Podcast.

Misc #

Below are some of the other things I have done or achieved this year:

My GitHub stats show that I have made over 1,000 public contributions on GitHub and participated in Hacktoberfest. We do not use GitHub at Simply Wall St.
I think I can call myself a (partial) LinkedIn Influencer now, with a million impressions reaching 93K+ people in 2024. I have also walked 1.87 million steps in 2024 if that counts as a stat.

I would like to thank the members of the Xplorers group for their warm messages of thanks.
I attended Serverless Days Sydney 2024, AWS Summit Sydney, Microsoft AI Tour twice, and Google Cloud Summit Sydney. I also attended other meetups and events like BeerOps.
Hopefully, the 1:1 chats I have conducted have helped many in multiple areas, such as career advice, what to do next, how to find a job in Australia and other useful topics.

The photos related to the above things can be viewed in this 2024 recap photo album.

Conclusion #

Looking back, 2024 was a productive year. I attended many tech events and was fortunate to be a guest on two podcasts. I also wrote the usual 25+ blog posts.

I look forward to 2025 but will scale back on things I have been doing, like blogging. Merry Christmas and Happy New Year 2025.

Enhance Your CV, LinkedIn, and GitHub Profile with Gemini 2.0 - Stream Realtime [includes video]

2024-12-16T10:35:45Z

Do you feel your resume, LinkedIn profile, or GitHub contributions must convey the right message? Your CV, LinkedIn profile, and GitHub repositories are your digital storefront, and keeping them fresh and relevant is key to attracting opportunities. In this post, we'll explore leveraging Google Gemini 2.0's real-time streaming capabilities to improve your CV, LinkedIn, and GitHub profile, focusing on practical examples and actionable strategies for you to land a tech role. Let's dive in!

Table of Contents #

Gemini 2.0 Flash (Experimental) Multimodal Capabilities #

Gemini 2.0 isn't just another LLM; it's a powerful multimodal model for the agentic era. This means it can process and understand different types of information, including text, images, audio, and even video. This opens up a world of possibilities for creating richer, more engaging content that captures attention and showcases your skills in a way that traditional text-based formats can't.

Unlike other models, Gemini 2.0 has native image output, multilingual native audio output, native tool use, and a multimodal live API. However, some of the features have not yet been released to everyone. If interested, you can also look at the API and SDK docs. As per Google’s official announcement:

Flash 2.0 is twice as fast as 1.5 Pro while achieving stronger performance, includes new multimodal outputs, and comes with native tool use.

Some have mentioned it as revolutionary, game changer, or even the next ChatGPT moment. Does it match all the hype and accolades? I will leave that decision up to you.

Imagine having a real-time conversation with an LLM while showing it a video, sharing audio from your mic, showing it a live video from your device’s camera, or even sharing your screen and asking questions about it. All this is possible with Gemini 2.0 Flash (currently experimental, released around a week ago) with the Stream Realtime feature. We will leverage this next to enhance your LinkedIn profile, CV, and GitHub profile.

Using Gemini 2.0 Live Stream to Improve CV, LinkedIn, and GitHub Profile #

Now that you understand Gemini 2.0's capabilities let's explore some practical examples of using its real-time streaming feature to enhance your online presence.

In a new browser window, open your LinkedIn profile in one tab, your CV in the next one (preferably a Google Doc), and your GitHub profile in the third one. Then, on the fourth tab, go to https://aistudio.google.com (Google AI Studio) and click on Stream Realtime, then paste the following prompt into System Instructions to get better responses to improve your LinkedIn profile, CV, and Github Profile:

You are now a combined LinkedIn Profile optimization expert and 
a technical CV writing expert focusing on software engineering. 
Your role is to provide actionable, easy-to-follow, and high-quality 
advice to improve a LinkedIn profile, including but not limited 
to headlines, about sections, and things the person is sharing on 
LinkedIn, is all geared toward the person finding a full-time tech role 
in Australia. Your expertise spans optimizing LinkedIn profiles 
and resumes toward finding the first or second full-time software 
engineering role focused on recent graduates. 

As an expert technical resume writer, focus on the XYZ formula and 
what both technical recruiters and technical people like software 
engineers and software engineering managers will appreciate in 
the resume. Also, emphasize keeping things simple, straightforward, 
and to the point. You should also be able to review and suggest 
improvements for a GitHub profile.

Adhere to the following principles and structure when providing advice:

General Instructions:
User Context Sensitivity: Tailor recommendations to the person's 
specific needs, considering the target audience, mainly technical 
recruiters and software engineering managers, goals, and finding 
the first technical role focusing on making a great first impression.

Clarity: Ensure all advice is straightforward, free of unnecessary 
jargon, and includes step-by-step guidance where relevant.

Actionability: Provide actionable advice with a clear path to 
implementation, including prioritization and how to maximize outcomes.

After that, click the “Select video source” camera icon and share the browser with all four tabs. For this example, we are using Sajan’s profile. He is a front-end engineer looking for a full-time role in Sydney, Australia.

Demo Video #

Then follow what is done in the below 17-minute video to elevate your LinkedIn profile, CV, and GitHub Profile to the next level:

Always remember that an LLM can hallucinate, so think critically before executing the suggestions given by Gemini 2.0 Flash or any other LLM, as they give out non-deterministic output.

You can see this demo as scratching the surface; you can ask other questions to the LLM in the context of enhancing your LinkedIn, CV, and GitHub profile.

In addition, you can use this powerful LLM with real-time capability to solve different types of problems, such as debugging the code on screen or even using it as an SEO and UX expert.

You can use Gemini to solve many problems, but choosing the right ones is up to you.

Conclusion #

Gemini 2.0's real-time streaming capabilities offer powerful tools for enhancing your online presence and showcasing your skills in a way that traditional text-based formats can't. Incorporating real-time content generation and feedback into your workflow allows you to create more engaging and informative content for your CV, LinkedIn, and GitHub profiles. So, unleash the power of Gemini 2.0 and transform your digital storefront into a showcase of your talent and expertise. Expand on it and use the powerful Gemini 2.0 model to get help to solve other problems. Bring it on!

How to Upsert Data in Postgres Using INSERT ON CONFLICT UPDATE

2024-12-14T10:37:45Z

Updating existing data is a core requirement of any web application; doing it efficiently will make your life easier. PostgreSQL, a robust and feature-rich relational database, offers a powerful and elegant solution for managing these updates: INSERT ON CONFLICT UPDATE. It is helpful to combine insert and update to Upsert and use the same logic for both operations. In this post, you will learn how to use INSERT ON CONFLICT UPDATE in Postgres to Upsert data effectively with practical examples. Let’s get going!

Table of contents #

Postgres Upsert with INSERT ON CONFLICT UPDATE syntax #

The INSERT ON CONFLICT clause in PostgreSQL provides an efficient way to perform an upsert operation. Unlike traditional INSERT statements coupled with UPDATE statements, which require separate queries, INSERT ON CONFLICT combines both actions into one. The syntax is as follows:

INSERT INTO table_name (column1, column2, ...)
VALUES (value1, value2, ...), (value3, value4, ...), ...
ON CONFLICT (unique_constraint) DO UPDATE
SET column1 = excluded.column1, column2 = excluded.column2, …
RETURNING columns/*

Let's break down this powerful command:

INSERT INTO table_name (column1, column2, ...): This specifies the target table and the columns to be inserted or updated. Ensure your column names are accurate!
VALUES (value1, value2, ...), (value3, value4, ...), ...: These are the values you're trying to insert. You can supply multiple sets of values to upsert multiple rows simultaneously.
ON CONFLICT (unique_constraint): This is the core of the upsert operation. You specify the unique constraint (typically a primary key or a unique index) that determines whether to perform an insert or an update. The unique constraint is crucial for identifying whether a row already exists.
SET column1 = excluded.column1, column2 = excluded.column2, ...: If a conflict is identified (a row with the same unique constraint already exists), this section specifies the columns to update and their new values. The keyword excluded refers to the values that were originally provided in the VALUES clause. This helps efficiently update the existing row without any complex subqueries.
RETURNING: clause returns from the insert or update statement the values of any columns after the insert or update was run. You can select some columns or everything with a *.

You can read more about the INSERT and INSERT ON CONFLICT part in the Postgres official docs. You can also read about Postgres Node.js Tutorial if you want to create a simple Node.js app interacting with Postgres. Also, you can read Postgrest insert multiple rows to learn about techniques to insert multiple rows into Postgres efficiently with the same example used below.

If your app has a data insertion, it will require updating data, too. Therefore, combining the two tasks into one becomes much more manageable, where Upsert becomes useful. In addition to INSERT ON CONFICT UPDATE, in the newer version of Postgres that is 15 and above, a MERGE statement is also available.

Upsert example with quotes table #

Let's illustrate INSERT ON CONFLICT UPDATE with a practical example using a quotes table. This table stores quotes along with their authors and has the following structure:

CREATE TABLE quote (
    id SERIAL PRIMARY KEY,
    quote character varying(255) NOT NULL UNIQUE,
    author character varying(255) NOT NULL,
    created_at timestamp with time zone DEFAULT CURRENT_TIMESTAMP NOT NULL,
    updated_at timestamp with time zone DEFAULT CURRENT_TIMESTAMP NOT NULL
);

Now, let's create a query that inserts 17 quotes about programming into this table as the initial data to work with:

INSERT INTO quote (quote, author) VALUES 
('There are only two kinds of languages: the ones people complain about and the ones nobody uses.', 'Bjarne Stroustrup'), 
('Any fool can write code that a computer can understand. Good programmers write code that humans can understand.', 'Martin Fowler'), 
('First, solve the problem. Then, write the code.', 'John Johnson'), 
('Java is to JavaScript what car is to Carpet.', 'Chris Heilmann'), 
('Always code as if the guy who ends up maintaining your code will be a violent psychopath who knows where you live.', 'John Woods'), 
('I''m not a great programmer; I''m just a good programmer with great habits.', 'Kent Beck'), 
('Truth can only be found in one place: the code.', 'Robert C. Martin'), 
('If you have to spend effort looking at a fragment of code and figuring out what it''s doing, then you should extract it into a function and name the function after the "what".', 'Martin Fowler'), 
('The real problem is that programmers have spent far too much time worrying about efficiency in the wrong places and at the wrong times; premature optimization is the root of all evil (or at least most of it) in programming.', 'Donald Knuth'), 
('SQL, Lisp, and Haskell are the only programming languages that I’ve seen where one spends more time thinking than typing.', 'Philip Greenspun'), 
('Deleted code is debugged code.', 'Jeff Sickel'), 
('There are two ways of constructing a software design: One way is to make it so simple that there are obviously no deficiencies and the other way is to make it so complicated that there are no obvious deficiencies.', 'C.A.R. Hoare'), 
('Simplicity is prerequisite for reliability.', 'Edsger W. Dijkstra'), 
('There are only two hard things in Computer Science: cache invalidation and naming things.', 'Phil Karlton'), 
('Measuring programming progress by lines of code is like measuring aircraft building progress by weight.', 'Bill Gates'), 
('Controlling complexity is the essence of computer programming.', 'Brian Kernighan'),
('The only way to learn a new programming language is by writing programs in it.', 'Dennis Ritchie');

In the next section, you will see a couple of examples of upserting a single row and then multiple rows in the above quote table.

Upsert a single row in the quotes table #

Let’s imagine a scenario where one quote can be edited with a form, and another form exists to insert new quotes. These two forms can use two different queries, one insert, and one update, but it would be much easier and more maintainable if both of these use cases utilized a single SQL query with upsert. That Upsert query in Postgres can be achieved with INSERT ON CONFLICT UPDATE as seen below:

INSERT INTO quote (id, quote, author) VALUES
(3, 'First, solve the problem. Then, write the code1.', 'John Johnson1')
ON CONFLICT (id) DO UPDATE
SET quote = excluded.quote, author = excluded.author, updated_at = DEFAULT
RETURNING *;

This query attempts to upsert a single quote. If a quote’s unique id (the primary key) already exists, both the quote and author columns will be updated to reflect the new value provided. The original created_at timestamp will be preserved, and updated_at will get the last updated time with the DEFAULT keyword, equating to the current timestamp. The values above don’t make sense, but they are used to show that the rows are being updated. If you provide the id as null, it will be inserted as it will not conflict with any existing id.

Upsert multiple rows on the quotes table #

The power of INSERT ON CONFLICT UPDATE truly shines when you need to handle multiple rows. For instance, let’s say you have a CSV file containing a list of quotes and their authors that you want to import into the quotes table. You could use a single query to insert all the quotes, ensuring that existing quotes are efficiently updated. This demonstrates a significant reduction in overhead compared to performing multiple individual INSERT and UPDATE operations.

Here’s a sample query that demonstrates this concept with an assumption that the CSV only had two quotes:

INSERT INTO quote (id, quote, author) VALUES
(4, 'Java is to JavaScript what car is to Carpet.2', 'Chris Heilmann2'),
(11, 'Deleted code is debugged code.3', 'Jeff Sickel3')
ON CONFLICT (id) DO UPDATE
SET quote = excluded.quote, author = excluded.author, updated_at = DEFAULT
RETURNING *;

If it were an actual application, the VALUES part would have been constructed based on the data provided in the CSV. This would have been done with a parameterized query or an ORM (Object-relational mapping) library of the team’s choice. Like above, if you put the id as null, that row will be inserted. In the case of this quote table, the quote column is also unique, so if the given quote matches an existing quote, you might get an error. Using the quote column as the conflict target of the conflict can be another way of dealing with that issue.

In this example, you are using id, but for your use case, you can use any unique column or constraint with a combination of more than one column. If the columns are passed correctly in the conflict target, the ON CONFLICT(...) part will work as expected.

You can find both examples as a DB Fiddle you can run, which looks like the below when you run it:

You can play around with that DB fiddle, fork it, and use it for your experiments as you please.

Conclusion #

PostgreSQL’s INSERT ON CONFLICT UPDATE feature offers a robust and efficient way to manage data updates. Understanding its syntax and other considerations can significantly improve your database operations, especially when dealing with bulk updates whenever possible, for improved performance and maintainability of your applications.

You learned about UPSERT in Postgres using the INSERT ON CONFLICT UPDATE clause used on the quotes table to upsert single and multiple rows. Always prioritize efficient, well-structured SQL and techniques to improve your application’s code, like combining both INSERT and UPDATE into one UPSERT.

How to use environment variables from a .env file in Node.js

2024-11-24T12:27:45Z

Environment variables are essential for configuring your Node.js applications, allowing you to tailor settings for different environments like development, testing, and production. While you can set environment variables directly in your system or terminal, a more elegant and organized approach is to use a .env file. This file allows you to store all your environment variables in one central location, keeping them separate from your code and making it easy to manage different configurations.

In this comprehensive guide, we'll delve into the world of environment variables in Node.js, explore the role of the popular dotenv package, and uncover how to use it effectively. We'll also explore the native way of accessing environment variables in Node.js 20+ using ESM (EcmaScript Modules). Let's get started!

Table of contents #

Introduction #

Imagine you're building a Node.js application that needs to connect to a database. You'll need to store the database credentials, such as the host, username, password, and database name, somewhere. Hardcoding these credentials directly into your code is a security risk and makes managing different configurations for different environments difficult.

That's where environment variables come to the rescue. You can store sensitive information like API keys, database credentials, and other configuration settings as environment variables, keeping them separate from your codebase. This enhances security and makes it easier to deploy your application to different environments with different configurations.

A .env file is a simple text file that stores your environment variables in a key-value format. It's a widely adopted convention in the Node.js ecosystem for managing environment variables, and there's a popular NPM package called dotenv that makes it easy to load these variables into your application. It also helps you achieve the config part of the 12-factor app. The 12-factor app’s config factor states, “requires strict separation of config from code. Config varies substantially across deploys, code does not.”

Prerequisites #

Before we dive into the code and explore how to use environment variables from a .env file, ensure you have the following prerequisites:

Node.js 20 or later installed: We'll be using the latest features of Node.js, so having a recent version installed is essential. You can check your Node.js version by running node --version in your terminal.
A basic understanding of Node.js and NPM: You should be familiar with running Node.js scripts and installing NPM packages using the npm install command.
Basic knowledge of JavaScript Modules: This guide will use ESM imports. This is the new standard for JavaScript modules, and it's supported in Node.js 14 and above.

In the subsequent section, you will learn how to get environment variables using the dotenv package.

Get environment variables using dotenv #

dotenv is a zero-dependency module that loads environment variables from a .env file into process.env. It's a simple and convenient way to manage your environment variables, keeping them separate from your code and making deploying your application to different environments easier.

Installing dotenv #

To get started with dotenv, you need to install it first. You can install it using the following command in your terminal:

npm install dotenv

This command will install the dotenv package from the NPM registry and add it to your project's package.json file as a dependency. At the time of writing dotenv is at version 16.4.5.

Creating a .env file #

After you have installed the dotenv package, you can create a .env file in the root directory of your project. This file will contain all your environment variables. The format of the .env file is simple. Each line represents an environment variable, with the key and value separated by an equal sign (=). For example:

DATABASE_NAME=quotes
DATABASE_USER=quotes_user

Loading environment variables #

To load the environment variables from your .env file into process.env, you need to call the config method from the dotenv module. This should be done at the very start of your application before any other code is executed. A common place to call the config method is in your application's entry point, which is usually index.js or app.js. For instance:

import 'dotenv/config';

const dbName = process.env.DATABASE_NAME;
const dbUser = process.env.DATABASE_USER;
console.log(`Database name is ${dbName} and database username is ${dbUser}`);

In the above code example, the dotenv/config import call loads the environment variables from the .env file into process.env. Then, you add two constants, dbName and dbUser, to get the relevant environment variables and show them on the console with console.log. If it was a real application, like an Express.js app, you would have used these variables to instantiate the connection to a database.

When you run the above index.js file with node index.js and the relevant .env file it will show the following output:

If you want to use a different filename than .env. You can import only the dotenv module with import dotenv from 'dotenv';, you can use a custom .env file (not .env), you can pass in the path to the file as an argument to the config method like the below:

dotenv.config({ path: './.env.example' });

Another useful way of using dotenv without any import or require is to pass it in the node command. For instance for a file named index-no-imports.js which as the below contents with the same .env file:

const dbName = process.env.DATABASE_NAME;
const dbUser = process.env.DATABASE_USER;
console.log(`Database name is ${dbName} and database user is ${dbUser}`);

You can acces the environment variable by executing:

node --require dotenv/config index-no-imports.js

Dotenv has a small ecosystem around it, you can learn more by reading it’s Readme file. There is dotenvx, you can encrypt and decrypt your environment variables, and dotenv-valut too. It is supported by companies like Warp, WorkOs and Alloy Automation. It also has a YouTube channel and a video explaining how to use it. There are lots of official examples using it from projects like Next.js to express.

In the next section, you will learn about the native way of accessing environment variables in Node.js 20+.

Get environment variables natively Node 20+ #

From Node.js 20+, there is a native way to access environment variables without the
need to install a NPM package like dotenv. From Node 20.6.0, one of the most notable changes is the built-in support for .env files

Using –env file in node cli #

With the same .env file and the index-no-imports.js file from the above example. You can use the environment variales from the .env file in any js file with the following command:

node --env-file .env index-no-imports.js

It will result in:

Which is exactly the same as the above output without the need to import dotenv in the file or in the command line. This is possible from Node 20.6.0 as it added built-in support for .env files.

You can update your .env file and run the code again and it will pick up the updated
environment variable. This is how you can access environment variables with
import.meta.env in Node.js 20+. You can use nodemon to restart your applications when a .js file changes or even when a .env file changes with the correct watches.

For your reference, you can find the code examples in this GitHub repository.

Dotenv is still popular #

Given that Node.js 20+ has provided a native way to access environment variables since Sep 2023 (more than a year from when this blog post was written), you might be wondering why to bother with dotenv? It's a valid question with a valid answer:
dotenv is still the most popular package for getting environment variables from a .env file.

With over 45 million downloads per week as of Nov 2023, dotenv is a battle-tested and widely used package in the Node.js ecosystem.

You can see a comparison of dotenv, node-env-file, and dot-env as per npm trends below:

As you can see, dotenv is the clear winner, with exponentially more weekly downloads than its competitor, node-env-file. Even though dot-env is a similar library, the downloads are minimal. dot-env has not been updated in the past 11 years, though, whereas dotenv is a very active project on GitHub.

Conclusion #

In this comprehensive guide, you have learned how to access environment variables from
a .env file in Node.js. You explored two methods, the first one using the popular dotenv package and the second one using the native --env-file available in Node.js 20+. With the knowledge of both methods, you can now make an informed choice depending on your project's needs.

If you are working on a new project, I recommend opting for the native --env-file method, as it's part of Node.js and doesn't need an external dependency. But if you are working on an existing project that already uses dotenv, there's no need to change.

In the end, both achieve the same goal of accessing environment variables from a .env file. Eventually you can move to the native way of accessing environment variables in Node.js 20+ as it's part of the Node.js core and doesn't need an external dependency. This will make your application more lightweight and reduce the number of dependencies.

A Beginner's Guide to Comparing Dates in JavaScript

2024-11-01T12:32:47Z

Working with dates and times is a common task in software development. In JavaScript, you have the built-in Date object, but it can be a bit cumbersome and has its quirks. There are third-party libraries like date-fns to help manipulate and format dates in JavaScript. However, the fundamental task of comparing two dates can be quickly done using the built-in methods, as you will learn in this post. You will start from the basics and move to use a third-party library, date-fns, for date comparison in JavaScript. Buckle up!

Table of Contents #

Introduction #

JavaScript has a Date object representing a single moment in time in a platform-independent format. This is achieved by representing the date object with a number with milliseconds since 1 January 1970 UTC.

JavaScript provides multiple ways to compare dates:

Direct Comparison: You can directly compare Date objects using comparison operators like >, <, >=, <=, but this method can be unreliable in some instances.
getTime() Method: The getTime() method returns the number of milliseconds since the Unix epoch (seconds that have elapsed since 00:00:00 UTC on 1 January 1970), making it reliable for comparing dates.
Using Libraries: Third-party libraries like date-fns provide convenient functions for comparing dates, including isBefore, isAfter, isEqual, compareAsc, etc. Other popular date-related JavaScript libraries include Moment.js and Luxon. As per NPM trends date-fns is the most popular one, with more than 23 million downloads per week at the time of writing this blog post.

In this guide, you will explore examples of all of the above methods. You will also learn about the best practices for comparing dates in JavaScript.

Prerequisites #

To follow along with the provided code examples in this post, you will need the following:

Node.js 22: The code in this tutorial uses the latest features of Javascript/Node.js, hence it is advisable to have the newest version installed on your machine.
A code editor: Any code editor that you are comfortable with will do. I have used VS Code for this tutorial, you can use any editor you choose.
General knowledge of JavaScript: This tutorial requires a basic understanding of how JavaScript works.

It is also helpful to have the following:

NPM or Yarn: Node Package Manager will be needed if you want to install date-fns.
Git and Github: You should know how to clone, commit, and push your changes to a Git repository on GitHub or similar service.

Let's get started!

Directly Compare JavaScript Date objects #

You can directly compare JavaScript Date objects using comparison operators like <, >, >=, <=, and ===. This is the easiest and most straightforward method to compare dates.

However, using these operators can lead to unpredictable results in certain scenarios due to how Date objects work internally, such as when considering time zones. You will see that in the example below:

const date1 = new Date('2024-11-01');
const date2 = new Date('2024-10-01');
const date3 = new Date('2024-11-02');

console.log(`date1: ${date1}`);
console.log(`date2: ${date2}`);
console.log(`date3: ${date3}`);

const date1Iso = date1.toISOString();
const date2Iso = date2.toISOString();
const date3Iso = date3.toISOString();

// Comparison results
console.log(`================== Comparison results ==================`);
const comparisonTable = [
  ['date1 < date2', `${date1Iso} < ${date2Iso}`, date1 < date2],
  ['date1 > date2', `${date1Iso} > ${date2Iso}`, date1 > date2],
  ['date1 <= date2', `${date1Iso} <= ${date2Iso}`, date1 <= date2],
  ['date1 >= date2', `${date1Iso} >= ${date2Iso}`, date1 >= date2],
  ['date1 === date2', `${date1Iso} === ${date2Iso}`, date1 === date2],
  ['date1 < date3', `${date1Iso} < ${date3Iso}`, date1 < date3],
  ['date1 > date3', `${date1Iso} > ${date3Iso}`, date1 > date3],
  ['date2 === date2', `${date2Iso} > ${date2Iso}`, date2 === date2],
]
console.table(comparisonTable);

When you run the above script with node index.js it will show the following output:

One issue that can happen here is, if the date is created with new Date() and another date is created with new Date() after 2 seconds those two dates will not match. Also time zones will be another consideration to make for comparing date objects.

For reliable comparison, always use the getTime() method discussed in the
next section.

Compare Dates using the getTime method #

For a reliable date comparison in JavaScript, it is recommended to use the getTime()
method available on the Date object. This is because when you compare Date objects directly, they are compared based on references, not their actual value. Even with the getTime() method you will need to consider time zones, the easiest way to do it will be to convert all the dates to UTC timezone and then only do the date/time comparison.

The getTime() method:

Returns the number of milliseconds that have elapsed since the Unix epoch (January 1, 1970, at 00:00:00 UTC).
Provides a consistent numerical representation of a date that's suitable for direct comparison.
Even with getTime do keep the time zone issue in mind, 5 AM in London is 4 PM in Sydney, also think about the date changes and day light savings.

The example below shows how to compare dates using the getTime() method to make it
more reliable:

const date1 = new Date('2024-11-01');
const date2 = new Date('2024-10-01');
const date3 = new Date('2024-11-02');

console.log(`date1: ${date1}`);
console.log(`date2: ${date2}`);
console.log(`date3: ${date3}`);

const date1Epoch = date1.getTime();
const date2Epoch = date2.getTime();
const date3Epoch = date3.getTime();

// Comparison results
console.log(`================== Comparison results ==================`);
const comparisonTable = [
  ['date1 < date2', `${date1Epoch} < ${date2Epoch}`, date1Epoch < date2Epoch],
  ['date1 > date2', `${date1Epoch} > ${date2Epoch}`, date1Epoch > date2Epoch],
  ['date1 >= date2', `${date1Epoch} >= ${date2Epoch}`, date1Epoch >= date2Epoch],
  ['date1 === date2', `${date1Epoch} === ${date2Epoch}`, date1Epoch === date2Epoch],
  ['date1 < date3', `${date1Epoch} < ${date3Epoch}`, date1Epoch < date3Epoch],
  ['date1 > date3', `${date1Epoch} > ${date3Epoch}`, date1Epoch > date3Epoch],
  ['date2 === date2', `${date2Epoch} > ${date2Epoch}`, date2Epoch === date2Epoch],
]
console.table(comparisonTable);

The example is similar to the above one, but this time, instead of directly comparing the dates, you first convert each date to a number using the getTime() method. Now, as the comparison is done between numbers, it will always be accurate and reliable. This should be the preferred way to compare dates in JavaScript in all of your applications. The above code is available as part of a pull request for your reference.

Here is the output of the script written to the file results-getTime.txt:

Even here you are comparing “dates” not Date time, meaning 2024-11-01 3:00:00 is after 2024-11-01 2:58:58, consider this fact. Also keep in mind the time zone differences.

In the next section, you will learn about a very popular JavaScript library called
date-fns and how to use it to compare dates with ease.

Dates comparison with date-fns #

There are many external npm libraries available to work with dates in JavaScript like date-fns, Moment.js and Luxon to name some. One of the most popular libraries for date and time manipulation is date-fns, which is downloaded more than 23 million times each week on npm.

In this section, you will learn how to easily use date-fns to compare dates in JavaScript. Before going to the examples, make sure to install it in your project. If your project is an npm project and you have a package.json file, run the following command to install it:

npm install date-fns

At the time of writing this blog post, date-fns is at version 4.1.0. In the next section, you will learn how to use compareAsc from date-fns to compare dates.

Compare Dates with compareAsc #

The compareAsc function in date-fns compares two dates and returns a number that indicates their relative order. It will return:

-1 if the first date is before the second date.
0 if the two dates are the same.
1 if the first date is after the second date.

The example below demonstrates how to use compareAsc to compare dates in JavaScript:

import { compareAsc } from 'date-fns';

const date1 = new Date('2024-11-01T00:00:00');
const date2 = new Date('2024-11-01T00:00:02');
const date3 = new Date(2024, 14, 2);

console.log(`date1: ${date1}`);
console.log(`date2: ${date2}`);
console.log(`date3: ${date3}`);

const date1Iso = date1.toISOString();
const date2Iso = date2.toISOString();
const date3Iso = date3.toISOString();

// Comparison results
console.log(`================== Comparison results ==================`);
const comparisonTable = [
  ['compareAsc date1 and date2', `${date1Iso} and ${date2Iso}`, compareAsc(date1, date2)],
  ['compareAsc date2 and date1', `${date2Iso} and ${date1Iso}`, compareAsc(date2, date1)],
  ['compareAsc date2 and date2', `${date2Iso} and ${date1Iso}`, compareAsc(date2, date2)],
  ['compareAsc date1 and date3', `${date1Iso} and ${date3Iso}`, compareAsc(date1, date3)],
  ['compareAsc date3 and date1', `${date3Iso} and ${date1Iso}`, compareAsc(date3, date1)],
  ['compareAsc date2 and date3', `${date2Iso} and ${date3Iso}`, compareAsc(date2, date3)],
];
console.table(comparisonTable);

console.log(`================== Sorting dates ASC ==================`);
console.log([date1, date2, date3].sort(compareAsc));

Let's analyze the above code. First, you import the compareAsc function from date-fns using ESM imports. Then, you initialize three date objects date1, date2, and date3. Then you run different comparison between the three dates, you also do a sorting of the 3 dates in ascending order. When you run the above script the output looks like the following:

There are other useful functions like isBefore, isAfter and isEqual that you can check out in the date-fns docs. In the next part, you will learn about more functions like these.

Other useful date-fns functions #

date-fns is a powerful and versatile JavaScript library for manipulating and formatting dates and times. You have just seen a couple of functions from date-fns in the above example, some more useful date-fns functions are as follows:

format: for formating the date as per the needed locale.
addDays, addMonths, addYears: to add days, months, and years to a date.
subDays, subMonths, subYears: to subtract days, months, and years from a date.
isToday, isYesterday, isTomorrow: to check if the date is today, yesterday, or tomorrow.
differenceInMilliseconds : is another useful function that can be used to compare dates in JavaScript.

These are some of the many other useful functions date-fns has, you can find a
complete list in the date-fns’s documentation. Use the search funcion to make the most of it.

All the code examples are available in this GitHub repository. If you want to write effective unit tests for dates, you will need to mock dates in jest to get the repeatable and predictabel results.

Conclusion #

In this tutorial, you learned about three different methods to compare dates in JavaScript. First, you used direct comparison, then you learned how to reliably compare dates using the JavaScript Date object's getTime() method. Finally, you learned how to use the date-fns library to compare dates with more ease. I hope this helped you understand more about javascript compare dates and that you learned something useful from it.

Happy coding and comparing Dates in JS!

How to Read a JSON File Using Node.js

2024-10-30T08:32:47Z

Reading a JSON (JavaScript Object Notation) file in Node.js is a common task for web developers, especially those working with backend and server-side applications. This tutorial will guide you through the process, breaking down the steps to efficiently read JSON files using both the native Node.js fs module and the fs-extra npm package. Let's dive into the world of JSON file handling and equip you with the knowledge to tackle this task seamlessly!

Table of contents #

Introduction #

JSON (JavaScript Object Notation) has become a cornerstone of data exchange on the web. It's a human-readable format that's also easily parsed by machines, making it ideal for APIs, configuration files, and more. If you're working with Node.js, chances are you'll encounter JSON files quite frequently.

This blog post provides a clear, concise, and hands-on guide to reading JSON files in Node.js. Whether you're a seasoned developer or just starting out, this tutorial will equip you with the knowledge and techniques to handle JSON data confidently in your Node.js projects.

You will cover two approaches:

Using the native fs module: This is the built-in Node.js module for interacting with the file on the file system. It provides both synchronous and asynchronous methods for reading files.
Using the fs-extra npm package: This package extends the functionality of the fs module, offering more user-friendly methods and simplifying common file operations.

By the end of this tutorial, you'll have a solid understanding of both methods and be able to choose the approach that best suits your needs.

Prerequisites #

Before you dive into the code, let's make sure you have the following prerequisites:

Node.js Installed: Ensure that you have Node.js installed on your system. You can download the latest LTS version from the official Node.js website. You will use Node.js 22 for this tutorial. You can check your Node.js version with:
```
node --version
```
Basic JavaScript Knowledge: A fundamental understanding of JavaScript syntax and concepts will help you follow along with the code examples.
Code Editor: You'll need a code editor to write and edit your JavaScript code. Some popular choices include VS Code, Atom, Sublime Text, and WebStorm.

For this tutorial, you will use ESM imports and not require as it is 2024. In the next section, you will learn about the data set of billionaires from 2023, which will be used as an example for this guide.

Example: Billionaires Data From 2023 #

To illustrate how to read a JSON file using Node.js, you will use a real-world example of data about the top 100 billionaires from 2023. The data is sourced from Kaggle and provides various details about these individuals. We have curated the data to focus on a few key attributes, making it easier to manage for this tutorial. The data is originally in CSV format, and it has been converted to JSON for this tutorial.

Here's a snippet of the JSON data you will use. It is stored in a file named billionaires-2023.json. The full data file can be found in this GitHub repository , a snippet of the data is shown below for your reference:

[
  {
    "rank": 1,
    "worth": 211000,
    "name": "Bernard Arnault & family",
    "gender": "M",
    "category": "Fashion & Retail",
    "country": "France",
    "city": "Paris",
    "source": "LVMH",
    "industries": "Fashion & Retail",
    "citizenship_country": "France",
    "organization": "LVMH Moët Hennessy Louis Vuitton",
    "title": "Chairman and CEO",
    "birth_year": 1949
  },
  {
    "rank": 2,
    "worth": 180000,
    "name": "Elon Musk",
    "gender": "M",
    "category": "Automotive",
    "country": "United States",
    "city": "Austin",
    "source": "Tesla, SpaceX",
    "industries": "Automotive",
    "citizenship_country": "United States",
    "organization": "Tesla",
    "title": "CEO",
    "birth_year": 1971
  },
  // ... rest of the billionaires data
]

This JSON data represents an array of objects, each of which corresponds to a billionaire and contains attributes like rank, net worth, name, gender, category, country of origin, and more.

Read JSON File Using Native fs #

The native fs module in Node.js provides a straightforward way to interact with the file system. You’ll explore two methods for reading JSON files using fs:

fs.readFile: This method reads the entire file asynchronously.
fs.readFileSync: This method reads the entire file synchronously.

Using asynchronous methods like fs.readFile for non-blocking operations is generally advisable, especially when dealing with larger files. This ensures that your application remains responsive while the file is being read.

Read JSON File Asynchronously Using fs #

Below is an example code for reading the data of 2,240 Billionaires in 2023 in an async way using promises.

import { readFile } from 'node:fs/promises';

try {
  const data = JSON.parse(await readFile('billionaires-2023.json', 'utf8'));
  console.log(data[1]); //Elon Musk's data
} catch (err) {
  console.error(`Error reading JSON file: ${err}`);
}

In this example:

You import the necessary readFile function from the node:fs/promises module to work with promises, providing a more modern asynchronous approach.
Inside the try block:
- You read the file asynchronously using await fs.readFile(filePath, 'utf8'), specifying the encoding as utf8.
- The read file's content is parsed into a JavaScript object using JSON.parse(data).
  Then, from the parsed data, the data of the second-rank Billionaire, Elon Musk, is logged on the console.
If an error occurs during the process in the catch block, it is logged with console.error.

As you are using Node version 22, top-level await is available. You can run the example with node index.js, it will show the following output:

In the next part, you will read the same file with fs but in a sync way.

Read JSON File Synchronously Using fs #

While it's recommended to use the asynchronous method, for completeness, let's
also, see how to read a JSON file synchronously.

import { readFileSync } from 'node:fs';

try {
  const data = JSON.parse(readFileSync('billionaires-2023.json', 'utf8'));
  console.log(data[1]); // Elon Musk's data
}
catch (err) {
  console.error(`Error reading JSON file: ${err}`);
}

This example is similar to the async version, but you use the synchronous readFileSync method. You should be aware that this method blocks the event loop until the file is read completely. The example file is 2.66 MB, which is not that big, but for a large file, this can be a time-consuming and CPU-hogging operation.

Read JSON File with fs-extra NPM Module #

fs-extra is a popular NPM package that extends the functionality of the native fs module. According to NPM Trends, it has more than 105 million downloads each week. It provides more user-friendly methods for working with the file system, including a convenient method for file system operations, such as reading JSON files.

To install fs-extra, run the following command:

npm install fs-extra

Next, you will look at an example of reading JSON async using the fs-extra NPM package.

Read JSON File async with fs-extra #

Here's an example of reading the JSON file using fs-extra:

import { readJson } from 'fs-extra/esm';

try {
  const data = await readJson('billionaires-2023.json');
  console.log(data[1]); // Elon Musk's data
}
catch (err) {
  console.error(`Error reading JSON file: ${err}`);
}

In this example:

You import the readJson function directly from the 'fs-extra' package.
The readJson function handles both reading the file and parsing it into a JavaScript object, simplifying the code.

Next is the example of reading JSON files with Node.js in a sync way using the fs-extra NPM package.

Read JSON File sync with fs-extra #

You can also use readJsonSync to read the JSON file in a sync manner. Below is an example of using the readJsonSync function in the fs-extra NPM package:

import { readJsonSync } from 'fs-extra/esm';

try {
  const data = readJsonSync('billionaires-2023.json');
  console.log(data[1]); // Elon Musk's data
}
catch (err) {
  console.error(`Error reading JSON file: ${err}`);
}

It is the same example as the above async one; the main difference here is that it reads the file in a sync way.

Using fs-extra can significantly reduce code complexity and improve readability, especially when performing common file operations.

All the code is available in this GitHub repository for your reference.

Conclusion #

In this comprehensive guide, you learned how to read JSON files in Node.js using both the native fs module and the fs-extra npm package. You explored both asynchronous and synchronous approaches, highlighting their advantages and considerations.

You also explored a real-world example using billionaires' data to illustrate how to effectively parse and work with JSON data in your Node.js applications. By understanding the concepts and techniques presented in this tutorial, you're well-equipped to tackle any task that involves reading and manipulating JSON data in your Node.js projects.

Remember to choose the best method for your needs, considering factors like file size, performance requirements, and your coding style. Keep learning and exploring the power of Node.js to build robust and scalable web applications.

If you need to process the JSON data you read efficiently, you can read about JavaScript array functions. Using nodemon to automatically restart the Node.js server when you make a code change will also help you become more productive. Happy coding!

Unblocking Software Engineers: Overcoming Non-technical and Technical Roadblocks

2024-10-12T09:35:37Z

Writing code is a small part of your job as a software engineer. You will also be communicating and managing expectations. In this post, you will learn about where software engineers get blocked while executing a task and how you (as a product person) can unblock them. You will also figure out ways to unblock yourself as a software engineer. Let’s get started!

Table of contents #

What to do vs how to do it #

For any given feature, what to do and how the feature should work for the customer is always the responsibility of a product manager (or another product person) in the team. Also, prioritization and what to do now (next sprint) falls under the PM’s duty. Regardless of whether the team is following Agile or not, Scrum or Kanban, how a new feature will behave, and the formula to calculate profit in this scenario are questions a product person in the team is liable to answer.

As a software engineer, maintain balance in the “what to do” part. You can provide your input here but don’t step on the PM’s foot and start dictating what the software should do.

On the other hand, the engineering team's prerogative is how to do a particular feature from a technical point of view. As a software engineer, you have to decide whether to write a single type of test or three types of tests, whether to refactor that 500-line fine and that 100-liner function. All the technical hows of the feature rests on your shoulders. Be careful here; the feature should be functional, and you should also address the performance and security aspects of that particular feature.

Software engineers blocked on what to do #

It is common for software engineers to pick up a task (usually a ticket on a task management software like Jira) and realize that the task does not have enough context to get started or make any progress. Tasks without enough description are commonplace, and bugs without steps to reproduce are also typical.

As a product manager (or product owner), having a clear definition of ready to be worked on is required. Another important aspect is that the definition of ready is followed strictly on almost all tasks. For instance, if the definition of ready means a task will have a user story and technical requirements defined, it is the task of the product team to have it filled in on all the tickets.

Unblocking from non-technical blockers #

So, how can you, as a product person, unblock the software engineer from taking on a new task or part of a more significant feature? The main thing here is to provide optimal context and examples so that the software engineer has fewer questions and decisions to make. It is better to overcommunicate and provide examples of what to do and how the user will achieve their goals.

A recent example I remember is when we had to calculate the average cost price of shares. The easiest way for me to understand the logic and formula and then convert the formula into code was to ask for a Google sheet where I could see an example. If the PM can add such artifacts beforehand to the task, it would be much easier for the software engineer to get started.

From the engineer’s point of view, it is a balancing act, too. You should be able to ask questions to related people for clarity to be a product-minded engineer. On the other hand, the ticket should also have enough context and examples to make your job easier.

Think of it this way: following the Pareto principle, 80% of your what to do questions should already be answered by the description, context, and examples of the task, and for the remaining 20%, you should go in and ask people over Slack or even do a small meeting for that extra clarity.

On the contrary, the product and design team must level up their game if you, as a software engineer, need to communicate with three people to get 80% of what to do for a task.

Sharpening up your people (soft) skills comes into play here. You should know who to talk to, how to get the correct information from that person, and how to unblock yourself aptly. In the next section, you will learn about technical blockers and how to unblock yourself from this category of blockers.

Technical blockers #

Depending on the software engineer's level and experience, technical blockers might exist when writing a new feature. Admit it: Even if you have 20 years of experience writing software, you will indeed Google 5+ times a day (or put your query on some LLM like ChatGPT). We all do that.

The difference is experienced engineers with more than a decade of experience in the language or framework used to build that software will do it under ten times a day. In contrast, less experienced (or junior) engineers will Google/ask an LLM more than ten times a day. In both cases, as long as you find your answer, you are unblocked and can make progress on your task. Sometimes technical blockers are solved faster and more efficiently by stepping away from the screen:

I have seen many programming problems remain unsolved while on a screen; on the contrary, when the engineer moves the problem to a physical board or paper, it gets solved much more quickly.

More senior engineers have called me on a board, and the solution comes faster on a board than being tied to a screen. That's where, especially for junior and mid-level engineers, the time you spend on a whiteboard teaches you many things. Architecture-level things, like drawing boxes and arrows between them, are best done collaboratively on a physical whiteboard.

You will also get stuck with technical things in multiple ways; it can be as small as a syntax issue. For instance, you might forget how to use JavaScript's spread syntax .... An issue in the library or the framework you use for that project can also block you. For example, you might need to learn how to do database transaction in TypeORM. You can solve many of these blockages with a quick Google search or by asking an LLM. The more difficult technical blockers are the ones that could be more obvious, as discussed next.

Getting unblocked from technical blockers #

Some technical blockers are complex, like reading the language, framework, or library documentation. These are more difficult to unblock your work from. One strategy that works for these blockers is to make the smallest possible change and see if you can move a bit forward. This strategy often works, but sometimes, you must do more than make a small change. Using a debugger property is also a critical skill; if you know the blockage and see all the variables and values while the issue happens, you can block yourself faster.

Orders over 1000 issue #

Once, while I was working for Namshi, around 2015, we faced a bizarre issue. We sometimes got this bug with order total, but it was causing problems. The problem became bigger in busy periods. I don’t remember the specifics, but I was assigned this bug, and it took me days to figure out the issue. I was trying to replicate it again and again. Finally, I cracked the nut; it was a weird formatting issue with one of the PHP functions at that time, which would format numbers wrong only when the order was more than 1000 AED. Using a debugger and finding a way to replicate the issue in my local environment helped me figure out the solution.

The Zendesk library case with an SSL issue #

Sometimes, you find yourself in a situation where no one deployed any new code. It was working yesterday, and today, it is not working. Most likely, in such cases, a third-party library has stopped working. One example from mid-2018 is when a system communicated with Zendesk using its APIs.

One fine morning, it started throwing unusual errors, and we had not deployed any change to the system for over a week. So, it was clear that the issue was not in the code we had written but something with a third-party library or a thing out of our control. On investigating deeper, it turned out to be an SSL issue.

The quickest thing I did to unblock was to fork the Node.js Zendesk library we were using and patch the issue. After that, use the patched fork in our code base. The app was down for a couple of hours. Due to the time zone, the main library accepted the patch later that day. The next day, we moved back to the patched version of the library.

For things that can wait, StackOverflow is your friend #

I am not very active on StackOverflow (anymore), but I used to read things there. As a software engineer, you Google things, and most of the time, you find a solution on StackOverflow.

Our CTO used to say, "This is surely not a unique issue. Someone would have surely faced this, and most likely, you will find a similar question and a useful answer on StackOverflow." He was correct.

I have asked some questions on StackOverflow and got answers in days. One of them is about the FOS Rest bundle and its serializer 11 years back (viewed more than 15K times now). The main thing to extract from here is don’t stop searching and asking things on StackOverflow, even in this age of LLMs. You don’t know a human might provide a different perspective and a handy solution to your problem.

On a different note, I would not consider open pull requests (or waiting for code review) a blockage. It is part of a good software engineering process that teaches you a lot about code, engineering, and writing software that will work well in production.

How to unblock software engineers visual summary #

Here is a quick visual summary of how to unblock software engineers from non-technical and technical blockers:

Slides:

Conclusion #

Software engineers at any level and in any company will get blocked from doing their work in non-technical or technical ways.

The focus should be on reducing the blocked time to increase the productivity of the software engineers and, eventually, the company’s productivity.

This blog post discusses how to unblock software engineers from roadblocks while working on new features. It stresses providing context and examples, addressing technical and non-technical blockers, using debuggers, and making small changes. The goal is to help product people and engineers work together effectively, leading to smoother development processes and successful products.

When you are blocked, think outside the box and make the smallest possible change to make a tiny but valuable progress. Also, don’t hesitate to use the power of the community on platforms like StackOverflow. Keep writing fantastic software while minimizing both non-technical and technical blockers.

Geshan's Blog

Getting Started with Google Agent Development Kit (ADK): Build and Run Your Simple Fact-Checker AI Agent

Table of contents #

Prerequisites #

Google Agent Development Kit (ADK) #

Fact checker agent #

Build the agent #

Run the agent in the CLI #

Run the agent with web UI #

Conclusion #

Using Spec Driven Development with AWS Kiro to add the last updated date on Eleventy blog

Table of contents #

Prerequisites #

Spec-driven development #

AWS Kiro IDE #

Adding updated at to an Eleventy blog #

Open the blog in Kiro #

Prompt Kiro for the new feature #

Kiro create the required docs and tasks #

Ask Kiro to implement tasks #

The feature worked #

Conclusion #

How to build a simple Google login and profile page on Google AI Studio with Firebase as a datastore [step-by-step]

Table of contents #

Prerequisites #

Build a simple login and profile page #

Google AI Studio - Build #

The prompt to generate the app #

Using Firebase for login and data storage #

Data in Firestore #

Conclusion #

Choosing the best git branching strategy for continuous delivery in your team

Table of Contents #

Continuous integration and delivery #

Gitflow #

GitHub Flow #

Deploying before merging #

Deploying after merging #

Trunk-based development #

GitHub flow strikes the right balance #

Conclusion #

How to use an open model with your application using Docker Model Runner and Docker Compose [Part 2]

Table of contents #

Prerequisites #

Settings for the API on Docker Desktop #

The demo Node application #

Code for the Node app #

Dockerfile for Node app #

Docker compose file with Node app and Smollm2 model #

Running the app with Docker Compose #

Conclusion #

Docker Model Runner: A beginner’s guide to running open models on your own machine [Part 1]

Table of contents #

Prerequisites #

Running Smollm2 with Docker Moder Runner #

Pull a model with Docker model runner #

Run a model #

Remove a model #

Conclusion #

Recap 2025: Blogging, public speaking, tech community work, and other things

Table of contents #

Highlights #

Public speaking in 2025 #

Blog posts in 2025 #

Most viewed blog posts of 2025 #

Being a guest on a podcast #

Community activities #

Listening to podcasts #

Misc #

Conclusion #

How to use Gemini Live audio as an interviewer for a software engineer’s job (with video)

Table of contents #

Gemini Multimodal capabilities #

Gemini Live Native Audio on Google AI Studio #

Steps to use Gemini Live audio as an interviewer #

Switching modes #

Demo video #

Other inteview ideas #

Conclusion #

How to create a hair style changer app using Gemini 3 on Google AI Studio