My Work

OpenAI Moderation API: safer multimodal LLM with omni-moderation-latest (text + image)

2026-01-10T15:32:00.006+05:30

The OpenAI Moderation API has always been the essential endpoint you wire into every production surface that accepts user-generated content. The big difference today is that moderation is no longer limited to text: The omni-moderation-latest model is a next-generation multimodal content moderation system built on GPT-4o which can classify text and images with a single request, and it gives you better tools for understanding why something was flagged.

This post is a follow-up to my earlier deep dive on the OpenAI moderation classifier: https://blog1.neuralengineer.org/llm-moderation-classifer-openai-moderation-api-fdb124c4536a

What moderation is (and is not)

Moderation answers one question: “Does this content appear to fall into one of the policy categories I care about?”

It is: - A fast, structured classifier for routing (allow, block, review, rate-limit, redact). - A complement to your product policy (not a replacement).

It is not: - A substitute for product decisions (what you choose to allow is up to you). - A guarantee that content is safe, legal, or appropriate in every context.

Models: why `omni-moderation-latest` is the new default

Most teams should treat these as the practical options: - omni-moderation-latest: best default; supports multimodal inputs and the newest taxonomy. - omni-moderation-2024-09-26: pinned omni model for reproducibility. - text-moderation-latest / text-moderation-stable: text-only models; useful for legacy paths or when you cannot send images.

The rest of this post assumes you are using omni-moderation-latest.

What’s new in `omni-moderation-latest` compared to older integrations

If your mental model is “call moderation and check flagged,” you’ll want to update it:

Finer-grained self-harm: self-harm/intent vs self-harm/instructions lets you route ideation differently from how-to content.
Illicit guidance buckets: illicit and illicit/violent catch wrongdoing instructions; in schemas these can appear as nullable booleans.
Modality attribution: category_applied_input_types is the difference between “block everything” and “remove only the image that caused the problem.”
Better multilingual coverage in practice: omni-moderation-latest is designed to handle many languages and mixed-language inputs more consistently than older text-only moderation setups; In a test of 40 languages, compared to the previous model, this new model improved 42% on openAI internal multilingual eval, and improved in 98% of languages tested.

Pricing and rate limits

At the time of writing, calling the Moderation endpoint is free, and usage is governed by model-specific rate limits.

Definitions: - RPM: requests per minute - RPD: requests per day - TPM: tokens per minute

Free tier limits for omni-moderation-latest: - 250 RPM - 5,000 RPD - 10,000 TPM

For the latest limits (and any pricing changes), see: https://platform.openai.com/docs/models/omni-moderation-latest

Request formats: single text, batch text, or multimodal

All requests go to:

POST https://api.openai.com/v1/moderations

1) Single text string

{ "model": "omni-moderation-latest", "input": "Some text to check" }

2) Batch of text strings

{ "model": "omni-moderation-latest", "input": ["text A", "text B"] }

3) Multimodal input (text + images in one “document”)

{
  "model": "omni-moderation-latest",
  "input": [
    { "type": "text", "text": "caption or message" },
    { "type": "image_url", "image_url": { "url": "https://example.com/image.png" } }
  ]
}

That third shape is the key new capability: you can send the user’s caption (or chat message) and the image they uploaded as a single item, then decide actions based on the combined signal.

Categories: what you get back (and where image moderation applies)

The Moderation API returns a results array containing: - flagged: a coarse summary boolean - categories: a map of category → boolean - category_scores: a map of category → float score - category_applied_input_types: a map of category → list of input types (e.g., ["text"], ["image"], or ["text","image"])

Categories that can apply to both text and image

These categories can be applied to both modalities: - sexual - self-harm, self-harm/intent, self-harm/instructions - violence, violence/graphic

Categories that are currently text-only

These categories are currently text-only: - hate, hate/threatening - harassment, harassment/threatening - sexual/minors - illicit, illicit/violent

Example: moderating text + image in one call

Here’s a minimal Python request that moderates a caption plus the uploaded image URL:

import os
from openai import OpenAI
from dotenv import load_dotenv

load_dotenv()  # reads .env into environment


client = OpenAI()

moderation = client.moderations.create(
    model="omni-moderation-latest",
    input=[
        {"type": "text", "text": "Chhaava director Laxman Utekar recently discussed how shooting for one of the crucial scenes in the film led the production to take a 1.5-month-long break and the set being dismantled"},
        {"type": "image_url", "image_url": {"url": "https://images.mid-day.com/images/images/2025/feb/vickytorture_d.jpg"}},
    ],
)

r0 = moderation.results[0]
print("flagged:", r0.flagged)
print("categories:", r0.categories)
print("applied_input_types:", getattr(r0, "category_applied_input_types", None))

Output response contains the following details - flagged - Set to true if the model classifies the content as potentially harmful, false otherwise.

categories - Contains a dictionary of per-category violation flags. For each category, the value is true if the model flags the corresponding category as violated, false otherwise.
category_scores - Contains a dictionary of per-category scores output by the model, denoting the model's confidence that the input violates the OpenAI's policy for the category. The value is between 0 and 1, where higher values denote higher confidence.
category_applied_input_types - This property contains information on which input types were flagged in the response, for each category. For example, if both the image and text inputs to the model are flagged for "violence/graphic", the violence/graphic property will be set to ["image", "text"]. This is only available on omni models.

flagged: True
categories: Categories(harassment=False, harassment_threatening=False, hate=False, hate_threatening=False, illicit=False, illicit_violent=False, self_harm=False, self_harm_instructions=False, self_harm_intent=False, sexual=False, sexual_minors=False, violence=True, violence_graphic=False, harassment/threatening=False, hate/threatening=False, illicit/violent=False, self-harm/intent=False, self-harm/instructions=False, self-harm=False, sexual/minors=False, violence/graphic=False)
applied_input_types: CategoryAppliedInputTypes(harassment=['text'], harassment_threatening=['text'], hate=['text'], hate_threatening=['text'], illicit=['text'], illicit_violent=['text'], self_harm=['text', 'image'], self_harm_instructions=['text', 'image'], self_harm_intent=['text', 'image'], sexual=['text', 'image'], sexual_minors=['text'], violence=['text', 'image'], violence_graphic=['text', 'image'], harassment/threatening=['text'], hate/threatening=['text'], illicit/violent=['text'], self-harm/intent=['text', 'image'], self-harm/instructions=['text', 'image'], self-harm=['text', 'image'], sexual/minors=['text'], violence/graphic=['text', 'image'])

We can see that the violence category is flagged for both text and image flag violations

If only the image triggered a category, you could remove the image but keep the user’s text.
If only the text is triggered, you can redact or block the text but keep the image.
If both are triggered, you can apply a stricter action.

Conclusion

omni-moderation-latest turns moderation from a text-only checkbox into a multimodal routing layer you can actually operate in production. The biggest wins are practical: you can classify images, you can attribute which modality caused a flag, and you get a taxonomy that’s easier to map to real product decisions.

References

https://blog1.neuralengineer.org/llm-moderation-classifer-openai-moderation-api-fdb124c4536a
https://platform.openai.com/docs/guides/moderation
https://platform.openai.com/docs/api-reference/moderations
https://openai.com/index/upgrading-the-moderation-api-with-our-new-multimodal-moderation-model/

Sentence Similarity and Semantic Search using free Huggingface Embedding API

2024-08-27T17:38:00.002+05:30

Sentence similarity involves determining the likeness between two texts. The idea behind semantic search is to embed all entries in your corpus, whether sentences, paragraphs, or documents, into a vector space. The query is embedded into the same vector space at search time, and the closest embeddings from your corpus are found.

Some applications of sentence similarity include question answering, passage retrieval, paraphrase matching, duplicate question retrieval, and semantic search.

In this article, we will explore semantic search. The application uses sentence similarity to implement a document search on a Medium blog article. The input to the application will be a question/sentence, and the output will be a set of sentences containing semantically similar content to the input sentence.

The Sentence Transformers library

The Sentence Transformers library is open-source for creating state-of-the-art embeddings from text and computing sentence similarity.

Hugging Face offers a free Serverless Inference API to provide on-demand predictions from over 100,000 models deployed on the Hugging Face Hub.

Inference Endpoints provide support for operations offered by the following libraries: -

Transformers
Sentence-Transformers, and
Diffusers (for the Text To Image task).

We have the option to choose any model from the **Sentence Transformers library by Huggingface Model Hun.** The sentence transformer models are of two categories

We will use the Sentence Embeddings feature extraction provided by the Sentence-Transformers library. There are two categories of sentence transformer models: Bi-Encoder (Retrieval) and Cross-Encoder (Re-Ranker).

The Bi-encoder independently embeds the sentences and search queries into a vector space. The result is then passed to the cross-encoder to check the relevance/similarity between the query and sentences.
A Cross-Encoder, based on a Cross-Encoder, can substantially improve the user's final results. The query and a possible document are simultaneously passed to the transformer network, which then outputs a single score between 0 and 1 indicating how relevant the document is for the given query. The Cross-Encoder further boosts performance, especially when searching over a corpus for which the bi-encoder was not trained.

In this article, we will explore the bi-encoder-based model sentence-transformers/all-MiniLM-L6-v2.

The model has 22.7 million parameters, and it can map sentences and paragraphs to a 384-dimensional dense vector space. It is designed for tasks such as clustering or semantic search.
This model is meant to be used as a sentence and short paragraph encoder. When given an input text, it produces a vector that captures the semantic information. The sentence vector can be used for information retrieval, clustering, or sentence similarity assessment. By default, input text longer than 256 word pieces is truncated.

import requests

model_id = "sentence-transformers/all-MiniLM-L6-v2"
API_TOKEN = "xxxxxxxxxxxxxxxxxxx"

api_url = f"<https://api-inference.huggingface.co/pipeline/feature-extraction/{model_id}>"
headers = {"Authorization": f"Bearer {API_TOKEN}"}
def query(payload):
    response = requests.post(API_URL, headers=headers, json=payload)
    return response.json()

data = query("Can you please let us know more details about your ")

In this article, we will utilize the langchain framework. We will precompute the embeddings using the Hugging Face endpoint and then create a vector store using chromaDB. We will then retrieve the top-k chunks of the article that match the embedding of the input question.

Below is a link to the demo application

https://huggingface.co/spaces/pi19404/reviewAnalyzerDemo

navigate to → Semantic Search

Enter the URL
Enter the question
Click on Submit

The application

will read the contents of a blog article in plain text.
It will then split the document into chunks of length 256 words, with an overlapping text of 50 words.
Next, it will encode the document and store the embeddings in a local cache for that document. When given an input query, it will return the top K sections of text from the blog article.

In future articles, we will explore the following features:

Providing answers instead of top-K query search results (RAG)
Ability to have continuous conversations with context
Ability to have conversations with a knowledge base consisting of multiple blog articles
Incorporating different types of RAG frameworks for more contextual conversations