<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	 xmlns:media="http://search.yahoo.com/mrss/" >

<channel>
	<title>PyCharm : The only Python IDE you need. | The JetBrains Blog</title>
	<atom:link href="https://blog.jetbrains.com/pycharm/feed/" rel="self" type="application/rss+xml" />
	<link>https://blog.jetbrains.com</link>
	<description>Developer Tools for Professionals and Teams</description>
	<lastBuildDate>Wed, 29 Apr 2026 17:43:21 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>
	hourly	</sy:updatePeriod>
	<sy:updateFrequency>
	1	</sy:updateFrequency>
	

<image>
	<url>https://blog.jetbrains.com/wp-content/uploads/2024/01/cropped-mstile-310x310-1-32x32.png</url>
	<title>PyCharm : The only Python IDE you need. | The JetBrains Blog</title>
	<link>https://blog.jetbrains.com</link>
	<width>32</width>
	<height>32</height>
</image> 
	<item>
		<title>Using Bag-of-Words With PyCharm</title>
		<link>https://blog.jetbrains.com/pycharm/2026/04/using-bag-of-words-with-pycharm/</link>
		
		<dc:creator><![CDATA[Jodie Burchell]]></dc:creator>
		<pubDate>Wed, 29 Apr 2026 17:42:41 +0000</pubDate>
		<featuredImage>https://blog.jetbrains.com/wp-content/uploads/2026/04/PC-social-BlogFeatured-1280x720-1-6.png</featuredImage>		<guid isPermaLink="false">https://blog.jetbrains.com/?post_type=pycharm&#038;p=703589</guid>

					<description><![CDATA[Have you ever wondered how machine learning models actually work with text? After all, these models require numerical input, but text is, well, text. Natural language processing (NLP) offers many ways to bridge this gap, from the large language models (LLMs) that are dominating headlines today all the way back to the foundational techniques of [&#8230;]]]></description>
										<content:encoded><![CDATA[
<p>Have you ever wondered how <a href="https://www.jetbrains.com/pycharm/data-science/" target="_blank" rel="noopener">machine learning</a> models actually work with text? After all, these models require numerical input, but text is, well, text.</p>



<p>Natural language processing (NLP) offers many ways to bridge this gap, from the large language models (LLMs) that are dominating headlines today all the way back to the foundational techniques of the 1950s. Those early methods fall under what we now call the <strong>bag-of-words (BoW) model</strong>, and despite their age, they remain remarkably effective for a wide range of language problems.</p>



<p>In this post, we&#8217;ll unpack how the bag-of-words model works, explore the techniques it uses to convert text into numerical representations, and look at where it fits relative to more modern NLP approaches. We&#8217;ll also build a text classification project using BoW techniques, and see how PyCharm&#8217;s specific features make the whole process faster and easier.</p>



<h2 class="wp-block-heading">What is the bag-of-words model?</h2>



<p>The bag-of-words model is a text representation technique that converts unstructured text into numerical vectors by tracking which words appear across a corpus (a collection of texts). Rather than preserving grammar or word order, it simply represents each document as a &#8220;bag&#8221; of its words, recording how often each one appears. The result is a vector of counts that captures what a text is about, even if it discards how that content is expressed.</p>



<p>This apparent limitation turns out to matter less than you might expect. For many tasks, such as text classification and sentiment analysis, the presence of certain words is often a stronger signal than their arrangement, and BoW captures that signal efficiently.</p>



<h2 class="wp-block-heading">How does bag-of-words work?</h2>



<p>To use the bag-of-words model, we need to convert each text in a corpus into a numerical vector. Let&#8217;s walk through how that works, starting with what that vector actually looks like.</p>



<p>Take the following sentence:</p>



<blockquote class="wp-block-quote">
<p>When diving into natural language processing, it is natural for beginners to feel overwhelmed by the complexity of sentiment analysis, which involves distinguishing negative from positive text. However, as you practice with libraries like NLTK or spaCy, the concepts naturally start to click.</p>
</blockquote>



<p>A vector representation of this text using the BoW model might look something like this.</p>



<figure class="wp-block-table"><table><tbody><tr><td>&#8230;</td><td>natural</td><td>naturally</td><td>nausea</td><td>near</td><td>neared</td><td>nearing</td><td>necessary</td><td>negative</td><td>&#8230;</td></tr><tr><td></td><td>2</td><td>1</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>1</td><td></td></tr></tbody></table></figure>



<p>If we think of this vector as a table, you may have noticed that each column represents a word in the corpus, and the row contains a number from 0 to 2. This number is a count of how many times the word occurs in the text, as we can see below:</p>



<blockquote class="wp-block-quote">
<p>When diving into <strong>natural</strong> language processing, it is <strong>natural</strong> for beginners to feel overwhelmed by the complexity of sentiment analysis, which involves distinguishing <strong>negative</strong> from positive text. However, as you practice with libraries like NLTK or spaCy, the concepts <strong>naturally</strong> start to click.</p>
</blockquote>



<p>Each column represents a word in the vocabulary; each value records how many times that word appears. Here, “natural” appears twice, while “naturally” and “negative” each appear once.</p>



<h3 class="wp-block-heading">Tokenization</h3>



<p>Before we can build this vector, we need to split our text into tokens. In BoW modeling, this is typically straightforward: We split on whitespace, so &#8220;When diving into natural language processing,&#8221; becomes seven tokens: <code>["When", "diving", "into", "natural", "language", "processing", ","]</code>. This is considerably simpler than the tokenization used in LLMs.</p>



<h3 class="wp-block-heading">Vocabulary creation</h3>



<p>Applying tokenization across every text in the corpus produces a long list of words. Deduplicating this list gives us our vocabulary, which we can see in the set of columns in the vector above. This process does introduce some noise: &#8220;Natural&#8221; and &#8220;natural&#8221;, for instance, would be treated as two separate tokens. We&#8217;ll look at preprocessing steps to address this shortly.</p>



<h3 class="wp-block-heading">Encoding</h3>



<p>With a vocabulary in hand, we create a vector for each text with one element per vocabulary word. Encoding is then the process of filling in those elements by checking each vocabulary word against the text.</p>



<p>The simplest approach is <strong>binary vectorization</strong>: 0 if a word is absent, 1 if present. More common, however, is <strong>count vectorization</strong>, which records the actual number of occurrences, as we saw in the example above. Count vectorization carries more information, since it helps distinguish texts that merely mention a topic from those that focus on it heavily.</p>



<p>One practical consequence of this approach is sparsity. If a corpus contains thousands of unique words, each vector will have thousands of elements, but any individual text will only use a small fraction of them, leaving most values at zero. This signal-to-noise issue is something we&#8217;ll return to.</p>



<h2 class="wp-block-heading">Advantages of the bag-of-words model</h2>



<p>The bag-of-words model has remained a staple in NLP for good reason. Its greatest strength is its simplicity: Because text is represented as a collection of word counts, the approach is easy to understand and straightforward to implement, making it a natural baseline before reaching for more complex architectures.</p>



<p>Beyond simplicity, BoW is computationally efficient. As you saw above, the underlying math is lightweight, which means it scales well to large text collections without demanding significant computing resources. For tasks where the presence of specific words is sufficient to capture meaning, with sentiment analysis and topic categorization being the clearest examples, it remains a highly effective tool.</p>



<h2 class="wp-block-heading">Applications of bag-of-words</h2>



<p>Like many NLP approaches, the bag-of-words model can be applied to many natural language problems. These potential applications include:</p>



<ul>
<li><strong>Document classification</strong>, where encoded documents are sorted into predefined categories. A classic example of this is automatically sorting incoming news articles into distinct categories such as sports, politics, or technology, as we’ll see in the project we do in this post.</li>



<li><strong>Sentiment analysis</strong>, where the presence of certain words strongly indicates the overall tone of a text, allows models to easily determine whether a piece of writing expresses a positive, negative, or neutral sentiment. If you’re interested in learning more about BoW and other approaches to sentiment analysis, you can see a <a href="https://blog.jetbrains.com/pycharm/2024/12/introduction-to-sentiment-analysis-in-python/">prior blog post</a> I wrote on this topic.</li>



<li><strong>Spam detection</strong>, which relies heavily on BoW to identify and filter out unwanted emails or messages by learning to recognize the distinct, high-frequency word patterns characteristic of spam.&nbsp;</li>



<li><strong>Retrieval systems</strong>, where it helps to efficiently find the most relevant documents from an immense corpus based on a user’s search query.&nbsp;</li>



<li><strong>Topic modeling</strong>, which aims to group similar text vectors in order to discover and extract the hidden, latent topics present within a large collection of documents.</li>
</ul>



<p>As you can see, the number of potential applications is broad, making bag-of-words modeling a popular first approach to natural language problems.</p>



<h2 class="wp-block-heading">Why use PyCharm for NLP?</h2>



<p><a href="https://www.jetbrains.com/pycharm/" target="_blank" rel="noopener">PyCharm</a> is particularly well-suited to bag-of-words modeling because it supports the iterative, detail-oriented workflow that text processing requires. As you’ll soon see, building a reliable BoW pipeline involves multiple steps, such as cleaning text, tokenizing, vectorizing, and validating outputs, and PyCharm&#8217;s code intelligence makes each of these smoother. Autocompletion, parameter hints, and quick navigation through specialized NLP libraries reduce friction when experimenting with different vectorizer settings, and help you understand how each component behaves.</p>



<p><a href="https://www.jetbrains.com/pycharm/features/debugger.html" target="_blank" rel="noopener">Debugging</a> and data inspection are equally important here, since small preprocessing mistakes can have an outsized effect on results. PyCharm lets you step through your code and examine intermediate states of things such as token lists and vocabulary at runtime, making it straightforward to verify that your feature extraction is working as intended. This visibility is especially useful when diagnosing issues like unexpected vocabulary sizes or missing terms.</p>



<p>PyCharm also supports exploratory work through its excellent <a href="https://www.jetbrains.com/help/pycharm/jupyter-notebook-support.html" target="_blank" rel="noopener">Jupyter Notebook integration</a> and scientific tooling. BoW modeling often involves trying different preprocessing strategies and observing their effects immediately, so the ability to run code interactively and inspect outputs inline is a genuine advantage. Combined with built-in virtual environment and package management support, this keeps experiments reproducible and well-organized.</p>



<p>As projects grow, PyCharm&#8217;s refactoring tools, project navigation, and version control integration help manage the added complexity. BoW models rarely exist in isolation, and they&#8217;re often embedded in larger ML pipelines. In such contexts, PyCharm’s features for working with larger applications mean you spend less time managing code and more time improving your models.</p>



<h3 class="wp-block-heading">Setting up the project</h3>



<p>To see these components in action, let&#8217;s build an actual bag-of-words project. We&#8217;ll use a classic text classification dataset and the AG News dataset, and then use the model to classify news articles into one of four categories: World, Sports, Business, or Science/Technology.</p>



<p>To get started in PyCharm, open the <em>Projects and Files</em> tool window and select <em>New… &gt; New Project…</em>. Since this is a data science project, we can use PyCharm&#8217;s built-in Jupyter project type, which sets up a sensible default structure for us.</p>



<p>During project configuration, you&#8217;ll be asked to choose a Python interpreter. By default, PyCharm uses uv and lets you select from a range of Python versions, though all major dependency management systems are supported: pip, Anaconda, Pipenv, Poetry, and Hatch. Every project is automatically created with an attached virtual environment, so your setup will be ready to go each time you reopen the project.</p>



<figure class="wp-block-image size-full"><img style="width:100% !important; height:auto !important; max-width:100% !important;" decoding="async" fetchpriority="high" src="https://blog.jetbrains.com/wp-content/uploads/2026/04/screenshot-1-selecting-uv-project.png" alt="" class="wp-image-703594"/></figure>



<p>With the project configured, we can install our dependencies via the <em>Python Packages</em> tool window. Simply search for a package by name, select it from the list, and install your desired version directly into the virtual environment. You can also see the same information about the package you&#8217;d find on PyPI directly within the IDE. For this project, we&#8217;ll need pandas and Numpy, along with datasets from Hugging Face, scikit-learn, Pytorch, and spaCy.</p>



<figure class="wp-block-image size-full"><img style="width:100% !important; height:auto !important; max-width:100% !important;" decoding="async" src="https://blog.jetbrains.com/wp-content/uploads/2026/04/screenshot-2-installing-package.png" alt="" class="wp-image-703605"/></figure>



<h2 class="wp-block-heading">Implementing bag-of-words with PyCharm</h2>



<p>There are many versions of this dataset online. We’ll be using <a href="https://huggingface.co/datasets/sh0416/ag_news" target="_blank" rel="noopener">one of the versions</a> hosted on Hugging Face Hub.</p>



<h3 class="wp-block-heading">Loading and preparing the data</h3>



<p>We’ll use Hugging Face’s <code>datasets</code> package to download this dataset.</p>



<pre class="EnlighterJSRAW" data-enlighter-language="python" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">from datasets import load_dataset
ag_news_all = load_dataset("sh0416/ag_news")</pre>



<p>This gives us a Hugging Face <code>DatasetDict</code> object. If we look at it, we can see it contains a training dataset with 120,000 news articles, and a test dataset containing 7,600 articles.</p>



<pre class="EnlighterJSRAW" data-enlighter-language="python" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">ag_news_all</pre>



<pre class="EnlighterJSRAW" data-enlighter-language="raw" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">DatasetDict({
    train: Dataset({
        features: ['label', 'title', 'description'],
        num_rows: 120000
    })
    test: Dataset({
        features: ['label', 'title', 'description'],
        num_rows: 7600
    })
})</pre>



<p>As we’ll be training a model, we also need a validation set. We’ll convert the training and test sets to pandas DataFrames, and use the <code>train_test_split</code> method from scikit-learn to create the validation set from the training data.</p>



<pre class="EnlighterJSRAW" data-enlighter-language="python" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">import pandas as pd
from sklearn.model_selection import train_test_split

ag_news_train = ag_news_all["train"].to_pandas()
ag_news_test = ag_news_all["test"].to_pandas()

ag_news_train, ag_news_val = train_test_split(
   ag_news_train,
   test_size=0.1,     
   random_state=456,   
   stratify=ag_news_train['label'] 
)

print(f"Training set: {len(ag_news_train)} samples")
print(f"Validation set: {len(ag_news_val)} samples")</pre>



<p>We now have a validation set with 12,000 articles, and a training set with 108,000 articles.</p>



<pre class="EnlighterJSRAW" data-enlighter-language="raw" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">Training set: 108000 samples
Validation set: 12000 samples</pre>



<p>For those of you new to machine learning, you might be wondering why we need all of these different datasets. The reason for this is to make sure we have a good idea that our model will generalize well and perform as expected on unseen data. The training set is the only data the model ever learns from directly. The validation set is used to monitor how the model is performing on unseen data as we make modeling decisions, such as choosing how many epochs to train for, how large to make the hidden layer, or which preprocessing steps to apply (we’ll see all of this later). This means that we look at validation performance repeatedly while building the model, and this increases the risk that our choices gradually become tuned to the quirks of that particular split. This is why we need a third set (the test set), which we keep completely locked away until we&#8217;ve finished all modeling decisions and want a single, unbiased estimate of how well our model will perform on new data. Using the test set for anything other than this final evaluation would give us an overly optimistic picture of our model&#8217;s real-world performance.</p>



<p>Let’s now inspect our datasets. PyCharm Pro has a lot of built-in features that make working with DataFrames easier, a few of which we’ll see soon. In this DataFrame, we have three columns: The article title and description, the article text, and the label indicating which of the four news categories the article belongs to. You can open any of the DataFrame cells in the <em>Value Editor</em> to see its full text, or widen the column to prevent truncation, both of which are useful for a quick visual inspection.</p>



<figure class="wp-block-image size-full is-resized"><img decoding="async" src="https://blog.jetbrains.com/wp-content/uploads/2026/04/screenshot-3-viewing-full-text.png" alt="" class="wp-image-703617" style="aspect-ratio:2.695219123505976;width:840px;height:auto; width:100% !important; height:auto !important; max-width:100% !important;"/></figure>



<p>At the top of each column, PyCharm displays column statistics, giving you an at-a-glance summary of the data. Switching from <em>Compact</em> to <em>Detailed</em> mode via <em>Show Column Statistics</em> gives you rich summary statistics about each column, and saves you from writing a lot of pandas boilerplate to get it! From these statistics, we can see that our training set is evenly split across the news categories (which is very handy when training a model). We can also see that some headlines and descriptions are not unique, which may introduce noise when classifying these duplicates.</p>



<figure class="wp-block-image size-full"><img style="width:100% !important; height:auto !important; max-width:100% !important;" decoding="async" loading="lazy" src="https://blog.jetbrains.com/wp-content/uploads/2026/04/screenshot-4-column-statistics.png" alt="" class="wp-image-703628"/></figure>



<p>The first step in preparing the data is basic string cleaning, which normalizes the text and reduces meaningless token variation. For instance, without cleaning, &#8220;Natural&#8221; and &#8220;natural&#8221; would be treated as two separate vocabulary entries, as we noted earlier.&nbsp;</p>



<p>We&#8217;ll apply four cleaning steps: lowercasing, punctuation removal, number removal, and whitespace normalization. There are different string cleaning steps you can apply depending on the language and use case, but for English-language texts, these tend to be very standard. Let’s go ahead and write a function to do this.</p>



<pre class="EnlighterJSRAW" data-enlighter-language="python" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">def apply_string_cleaning(dataset: pd.Series) -> pd.Series:
   patterns_to_remove = [
       r"[^a-zA-Z\s]",
   ]

   cleaned = dataset.str.lower()

   for pattern in patterns_to_remove:
       cleaned = cleaned.str.replace(pattern, " ", regex=True)

   cleaned = cleaned.str.replace(r"\s+", " ", regex=True).str.strip()

   return cleaned

ag_news_train["title_clean"] = apply_string_cleaning(ag_news_train["title"])
ag_news_train["description_clean"] = apply_string_cleaning(ag_news_train["description"])</pre>



<figure class="wp-block-image size-full"><img style="width:100% !important; height:auto !important; max-width:100% !important;" decoding="async" loading="lazy" src="https://blog.jetbrains.com/wp-content/uploads/2026/04/screenshot-5-raw-and-cleaned-text.png" alt="" class="wp-image-703639"/></figure>



<p>This mostly works, but there&#8217;s one issue: The regex strips apostrophes entirely, turning contractions like &#8220;you&#8217;re&#8221; into &#8220;you re&#8221; and possessives like &#8220;Canada’s&#8221; into &#8220;Canada s&#8221;. The cleanest fix is a regex that preserves apostrophes in contractions while removing possessive endings, but this is not the most enjoyable thing to write by hand.</p>



<p>This is where PyCharm&#8217;s built-in <a href="https://www.jetbrains.com/pycharm/features/ai/" target="_blank" rel="noopener">AI Assistant</a> comes in. Open the chat window via the <em>AI Chat</em> icon on the right-hand side of the IDE and enter the following prompt:</p>



<blockquote class="wp-block-quote">
<p>Can you please alter the <code>@apply_string_cleaning</code> function so that it retains apostrophes inside words when they&#8217;re used for contractions (e.g., &#8220;you&#8217;re&#8221;), but removes them when they&#8217;re used for possessives (e.g., &#8220;Canada’s&#8221; into &#8220;Canada&#8221;).</p>
</blockquote>



<p>The <code>@</code> notation lets you reference specific files or objects in your IDE without copying and pasting code into the prompt, including Jupyter variables like datasets and functions.</p>



<figure class="wp-block-image size-full"><img style="width:100% !important; height:auto !important; max-width:100% !important;" decoding="async" loading="lazy" src="https://blog.jetbrains.com/wp-content/uploads/2026/04/screenshot-6-ai-chat.png" alt="" class="wp-image-703650"/></figure>



<p>I ran this against Claude Sonnet 4.5, though JetBrains AI supports a wide range of models from OpenAI, Anthropic, Google, and xAI, as well as open models via Ollama, LM Studio, and OpenAI-compatible APIs. Below is the updated function it returned:</p>



<pre class="EnlighterJSRAW" data-enlighter-language="python" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">def apply_string_cleaning(dataset: pd.Series) -> pd.Series:
    cleaned = dataset.str.lower()
    
    # Remove possessive apostrophes (word's -> word)
    # This pattern matches: letter(s) + 's + word boundary
    cleaned = cleaned.str.replace(r"(\w+)'s\b", r"\1", regex=True)
    
    # Remove all non-letter characters except apostrophes within words
    cleaned = cleaned.str.replace(r"[^a-zA-Z'\s]", " ", regex=True)
    
    # Clean up any apostrophes at the start or end of words
    cleaned = cleaned.str.replace(r"\s'|'\s", " ", regex=True)
    
    # Remove multiple spaces and trim
    cleaned = cleaned.str.replace(r"\s+", " ", regex=True).str.strip()
    
    return cleaned

ag_news_train["title_clean"] = apply_string_cleaning(ag_news_train["title"])
ag_news_train["description_clean"] = apply_string_cleaning(ag_news_train["description"])
</pre>



<p>We can insert this into our Jupyter notebook directly by clicking on <em>Insert Snippet as Jupyter Cell</em> in the AI chat.</p>



<figure class="wp-block-image size-full"><img style="width:100% !important; height:auto !important; max-width:100% !important;" decoding="async" loading="lazy" src="https://blog.jetbrains.com/wp-content/uploads/2026/04/screenshot-7-insert-code-as-cell.png" alt="" class="wp-image-703664"/></figure>



<p>Once we run this updated function on our raw text, we get the correct result:</p>



<figure class="wp-block-table"><table><tbody><tr><td><strong>text</strong></td><td><strong>text_clean</strong></td></tr><tr><td>Don’t stand for racism &#8211; football chief</td><td>don&#8217;t stand for racism football chief</td></tr><tr><td>Canada&#8217;s Barrick Gold acquires nine per cent stake in Celtic Resources (Canadian Press)</td><td>canada barrick gold acquires nine per cent stake in celtic resources canadian press</td></tr></tbody></table></figure>



<p>We can see the contraction “don’t” is correctly preserved in the first example, but the possessive “Canada’s” has been removed. We apply this to both the training and validation datasets using the same function, so that the cleaning is consistent across both splits:</p>



<pre class="EnlighterJSRAW" data-enlighter-language="python" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">ag_news_val["title_clean"] = apply_string_cleaning(ag_news_val["title"])
ag_news_val["description_clean"] = apply_string_cleaning(ag_news_val["description"])</pre>



<h3 class="wp-block-heading">Creating the bag-of-words model</h3>



<p>Now that we have clean text, we need to build our vocabulary and encode it. We&#8217;ll use scikit-learn&#8217;s <code>CountVectorizer</code> for this:</p>



<pre class="EnlighterJSRAW" data-enlighter-language="python" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">from sklearn.feature_extraction.text import CountVectorizer

countVectorizerNews = CountVectorizer()
countVectorizerNews.fit(ag_news_train["text_clean"])
ag_news_train_cv = countVectorizerNews.transform(ag_news_train["text_clean"]).toarray()</pre>



<p>The process has two distinct steps. First, <code>.fit()</code> scans the training data and builds a vocabulary by identifying every unique word and assigning it a fixed index position (for example, &#8220;government&#8221; = column 8,901). The result is a mapping of 59,544 unique words, which you can think of as the column headers for our eventual matrix.</p>



<p>Second, <code>.transform()</code> uses that vocabulary to convert each headline into a numerical vector, counting how many times each vocabulary word appears and placing that count at the corresponding index position.</p>



<p>The reason these are two separate steps is important: When we later process our validation and test data, we&#8217;ll call <code>.transform()</code> using the vocabulary learned from the training set. This ensures that all three splits share a consistent feature space. If we re-ran .fit() on the test data, we&#8217;d get a different vocabulary, and the model&#8217;s predictions would be meaningless.</p>



<p>With the vectorizer fitted and our training data transformed, we can start exploring what we&#8217;ve actually built. Let&#8217;s first take a look at the vocabulary. <code>CountVectorizer</code> stores it as a dictionary mapping each word to its index position, accessible via <code>vocabulary_</code>:</p>



<pre class="EnlighterJSRAW" data-enlighter-language="python" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">countVectorizerNews.vocabulary_</pre>



<pre class="EnlighterJSRAW" data-enlighter-language="raw" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">{'fed': 18461,
 'up': 55833,
 'with': 58324,
 'pension': 38929,
 'defaults': 13156,
 'citing': 9475,
 'failure': 18077,
 'of': 36704,
 'two': 54804,
 'big': 5269,
 'airlines': 1139,
 'to': 53531,
 'make': 31397,
 'payments': 38686,
 'their': 52947,
...}</pre>



<pre class="EnlighterJSRAW" data-enlighter-language="python" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">len(countVectorizerNews.vocabulary_)</pre>



<pre class="EnlighterJSRAW" data-enlighter-language="raw" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">59544</pre>



<p>This confirms that our vocabulary contains 59,544 unique words. Browsing through it, you can start to guess what kinds of terms appear frequently in the different types of news. Country names feature heavily in the “world” news category, terms like “football” and “cricket” in the “sports” news category, terms like “profit” and “losses” in the “business” news category, and company names like “Google” and “Microsoft” in the “science/technology” category.</p>



<p>Next, let&#8217;s inspect the feature matrix itself. ag_news_train_cv is a NumPy array with one row per headline and one column per vocabulary word, giving us a matrix of shape (108,000 × 59,544). We can wrap it in a DataFrame to make it easier to inspect in PyCharm&#8217;s DataFrame viewer:</p>



<pre class="EnlighterJSRAW" data-enlighter-language="python" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">pd.DataFrame(ag_news_train_cv, columns=countVectorizerNews.get_feature_names_out())</pre>



<figure class="wp-block-image size-full"><img style="width:100% !important; height:auto !important; max-width:100% !important;" decoding="async" loading="lazy" src="https://blog.jetbrains.com/wp-content/uploads/2026/04/screenshot-8-sparse-matrix.png" alt="" class="wp-image-703675"/></figure>



<p>As expected, the matrix is very sparse. Most values are zero, since any individual headline only contains a small fraction of the full vocabulary. In fact, you might have noticed that the number of columns is two-thirds of the number of rows, which is never good for a feature matrix. We’ll explore how to reduce the dimensionality of the feature space in a later section.</p>



<p>Note that we also need to apply this vectorization to the validation dataset before moving on to modeling. Importantly, we are only applying the <code>.transform</code> method to the validation set, as we already trained it on the training dataset.</p>



<pre class="EnlighterJSRAW" data-enlighter-language="python" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">ag_news_val_cv = countVectorizerNews.transform(ag_news_val["text_clean"]).toarray()</pre>



<h2 class="wp-block-heading">Visualizing the results</h2>



<p>Before we move onto reducing down the dimensionality of our feature space, let&#8217;s explore the distribution of the words in our corpus. This can help us to understand the most common and rare words, and how we might use this to further process our data to amplify the signal-to-noise ratio.</p>



<h3 class="wp-block-heading">Word frequency plots</h3>



<p>We’ll start by creating a DataFrame that aggregates word counts across all headlines and ranks them by frequency:</p>



<pre class="EnlighterJSRAW" data-enlighter-language="python" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">import numpy as np

vocab = countVectorizerNews.get_feature_names_out()
counts = np.asarray(ag_news_train_cv.sum(axis=0)).flatten()

pd.DataFrame({
  'vocab': vocab,
  'count': counts,
}).sort_values('count', ascending=False).reset_index(drop=True)</pre>



<p>First, we retrieve the vocabulary in index order using <code>get_feature_names_out()</code>, so each word lines up with its corresponding column in the feature matrix. We then sum the matrix column-wise (that is, across all documents) to get the total number of times each word appears in the training set. Finally, we wrap these two arrays into a DataFrame and sort by count, giving us a ranked list of the most frequent terms.</p>



<p>Once this DataFrame is displayed in PyCharm, we can easily turn it into a visualization without writing a single line of code. By clicking on the <em>Chart View</em> button in the top left-hand corner of the DataFrame, we can explore a range of ways of visualizing our data. Go to <em>Show Series Settings</em> in the top right-hand corner, and adjust the parameters to get the count frequencies of the words: we set the <em>X axis</em> value to “vocab” (and change <em>group and sort</em> to <em>none</em>), the <em>Y axis</em> value to “count”, and the chart type to “Bar”.</p>



<figure class="wp-block-image size-full"><img style="width:100% !important; height:auto !important; max-width:100% !important;" decoding="async" loading="lazy" src="https://blog.jetbrains.com/wp-content/uploads/2026/04/screenshot-9-chart-view.png" alt="" class="wp-image-703686"/></figure>



<p>Hovering over this chart, we can see that it has a very long-tailed distribution, which is very typical of vocabulary frequencies (this is actually so typical that it is described in something called Zipf’s law). This means that the majority of our words very rarely occur in the text, and in fact, if we hover over the right-hand side of the chart, it looks like around a third of our vocabulary terms are only used once!&nbsp;</p>



<p>On the other hand, when we hover over the left-hand side of the chart, we can see that this is dominated by very common words, prepositions, and articles, such as “to”, “in”, “the”, and “you”. These words don’t really carry any meaning and pretty much occur in every text, so they’re unlikely to be useful for our classification task.&nbsp;</p>



<p>Let’s have a look at some things we can do to clean up our feature space and help our semantically meaningful words stand out a bit more.</p>



<h2 class="wp-block-heading">Advanced bag-of-words techniques</h2>



<p>The basic BoW pipeline we&#8217;ve built so far is a solid foundation, but there are several techniques that can meaningfully improve its quality. This section walks through the most important ones. We’ll only be using a selection of them in our project, but you can investigate which of these seem appropriate when building your own project.</p>



<h3 class="wp-block-heading">Stop word removal</h3>



<p>Stop words are extremely common words that appear frequently across all kinds of text but carry little meaningful information. This includes words like &#8220;the&#8221;, &#8220;is&#8221;, &#8220;and&#8221;, &#8220;of&#8221;, as we saw in the histogram in the previous section. They inflate vocabulary size without adding signal, so removing them is one of the most straightforward ways to improve your BoW representation. NLTK provides a built-in stop word list for English and many other languages.</p>



<h3 class="wp-block-heading">Stemming and lemmatization</h3>



<p>Another issue you might have noticed in our vocabulary is that words that are semantically equivalent have different syntactic forms, meaning that while they should be treated as the same token, they occupy additional token slots. We can resolve this through two techniques: stemming and lemmatization. Stemming reduces words to their root form using simple rule-based truncation (e.g. &#8220;running&#8221; → &#8220;run&#8221;), while lemmatization takes a linguistic approach, mapping words to their dictionary base form. Lemmatization is slower but generally produces cleaner results, particularly for irregular word forms.</p>



<h3 class="wp-block-heading">TF-IDF</h3>



<p>Term frequency-inverse document frequency (TF-IDF) is an extension of basic count vectorization that weights each word by how informative it actually is. A word that appears frequently in one document but rarely across the corpus receives a high weight; a word that appears everywhere receives a low one. This neatly addresses one of the core weaknesses of raw count vectors: common but uninformative words can dominate the feature space even after stop-word removal.</p>



<h3 class="wp-block-heading">N-grams</h3>



<p>Standard BoW treats each word independently, which means it misses phrases whose meaning depends on word combinations. A classic example of this is &#8220;machine learning”, which has a distinct meaning to “machine” + “learning”. N-grams address this by treating sequences of adjacent words as single tokens, so a bigram model would capture &#8220;machine learning&#8221; as a feature in its own right. The trade-off is a much larger vocabulary, so in practice, bigrams are most commonly used, with trigrams reserved for cases where capturing longer phrases is important.</p>



<h3 class="wp-block-heading">Handling out-of-vocabulary words</h3>



<p>When you apply your fitted vectorizer to new data, any words not present in the training vocabulary are silently ignored by default. For many tasks, this is acceptable, but if your production data is likely to continue introducing new terms that carry meaningful signal, it&#8217;s worth considering alternatives. One common approach is to reserve a special &lt;UNK&gt; token to represent unseen words, which at least preserves the information that something unfamiliar appeared, even if its identity is unknown and multiple (perhaps unrelated) words are collapsed onto the same token.&nbsp;</p>



<p>However, LLMs, with their more flexible approach to tokenization, tend to be a better choice if out-of-vocabulary words will be a major issue for your model once it is in production.</p>



<h3 class="wp-block-heading">Dimensionality reduction</h3>



<p>Even after stop word removal and other cleaning steps, BoW feature matrices are typically very high-dimensional and sparse. Two widely used techniques can help. Reducing to the top-N most frequent terms is the simplest approach, discarding low-frequency words that are unlikely to generalize well. For a more principled reduction, techniques like principal component analysis (PCA) or latent semantic analysis (LSA) project the feature matrix into a lower-dimensional space, compressing the representation while preserving as much of the meaningful variance as possible.</p>



<h3 class="wp-block-heading">Feature selection techniques</h3>



<p>Rather than reducing dimensionality arbitrarily, feature selection methods identify and retain only the features most relevant to your specific task. Chi-squared testing measures the statistical dependence between each term and the target label, making it well-suited to classification tasks. Mutual information takes a similar approach, scoring each feature by how much it reduces uncertainty about the class. Both methods can substantially reduce vocabulary size while preserving model performance.</p>



<h2 class="wp-block-heading">Applying bag-of-words to a real-world problem</h2>



<p>Let&#8217;s now continue the example we started earlier. We&#8217;re going to take the work we&#8217;ve done on our AG News text classification task and take it to its completion by building a model.</p>



<p>A common way to build a model using encoded text is neural networks, where each of the words in the vocabulary is treated as a feature, and the categories we want to predict (in our case, the news category) are the output. We&#8217;ll start by building a baseline model that applies only string cleaning and encoding to the text.</p>



<p>I had originally written this model in Keras, as part of a previous BoW project from a couple of years ago. However, that code was now out of date. In order to update it and adapt it to Pytorch, I asked JetBrains AI to do the following:</p>



<blockquote class="wp-block-quote">
<p>Please update this neural network from Keras to Pytorch, making improvements to make the code as reusable as possible.</p>
</blockquote>



<p>This gave us the following successful port of the code:</p>



<pre class="EnlighterJSRAW" data-enlighter-language="python" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import TensorDataset, DataLoader

class MulticlassClassificationModel(nn.Module):
   def __init__(self, input_size: int, hidden_layer_size: int, num_classes: int = 4):
       super(MulticlassClassificationModel, self).__init__()
       self.fc1 = nn.Linear(input_size, hidden_layer_size)
       self.relu = nn.ReLU()
       self.fc2 = nn.Linear(hidden_layer_size, num_classes)

   def forward(self, x):
       x = self.fc1(x)
       x = self.relu(x)
       x = self.fc2(x)
       return x

def train_text_classification_model(
       train_features: np.ndarray,
       train_labels: np.ndarray,
       validation_features: np.ndarray,
       validation_labels: np.ndarray,
       input_size: int,
       num_epochs: int,
       hidden_layer_size: int,
       num_classes: int = 4,
       batch_size: int = 1920,
       learning_rate: float = 0.001) -> MulticlassClassificationModel:

   # Convert labels to 0-indexed (AG News has labels 1,2,3,4 -> need 0,1,2,3)
   train_labels_indexed = train_labels - 1
   validation_labels_indexed = validation_labels - 1

   # Convert numpy arrays to PyTorch tensors
   X_train = torch.FloatTensor(train_features.copy())
   y_train = torch.LongTensor(train_labels_indexed.copy())
   X_val = torch.FloatTensor(validation_features.copy())
   y_val = torch.LongTensor(validation_labels_indexed.copy())

   # Create datasets and dataloaders
   train_dataset = TensorDataset(X_train, y_train)
   train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)

   # Initialize model, loss function, and optimizer
   model = MulticlassClassificationModel(input_size, hidden_layer_size, num_classes)
   criterion = nn.CrossEntropyLoss()
   optimizer = optim.RMSprop(model.parameters(), lr=learning_rate)

   # Training loop
   for epoch in range(num_epochs):
       model.train()
       train_loss = 0.0
       correct_train = 0
       total_train = 0

       for batch_features, batch_labels in train_loader:
           # Forward pass
           outputs = model(batch_features)
           loss = criterion(outputs, batch_labels)

           # Backward pass and optimization
           optimizer.zero_grad()
           loss.backward()
           optimizer.step()

           # Calculate training metrics
           train_loss += loss.item()
           _, predicted = torch.max(outputs, 1)
           correct_train += (predicted == batch_labels).sum().item()
           total_train += batch_labels.size(0)

       # Validation
       model.eval()
       with torch.no_grad():
           val_outputs = model(X_val)
           val_loss = criterion(val_outputs, y_val)
           _, val_predicted = torch.max(val_outputs, 1)
           correct_val = (val_predicted == y_val).sum().item()
           total_val = y_val.size(0)

       # Print epoch metrics
       train_acc = correct_train / total_train
       val_acc = correct_val / total_val
       print(f'Epoch [{epoch+1}/{num_epochs}], '
             f'Train Loss: {train_loss/len(train_loader):.4f}, '
             f'Train Acc: {train_acc:.4f}, '
             f'Val Loss: {val_loss:.4f}, '
             f'Val Acc: {val_acc:.4f}')

   return model

def generate_predictions(model: MulticlassClassificationModel,
                       validation_features: np.ndarray,
                       validation_labels: np.ndarray) -> list:
   model.eval()

   # Convert to tensors
   X_val = torch.FloatTensor(validation_features.copy())

   with torch.no_grad():
       outputs = model(X_val)
       _, predicted = torch.max(outputs, 1)

   # Convert back to 1-indexed labels to match original dataset
   predicted_labels = (predicted.numpy() + 1)

   print("Confusion Matrix:")
   print(pd.crosstab(validation_labels, predicted_labels,
                     rownames=['Actual'], colnames=['Predicted']))
   return predicted_labels.tolist()</pre>



<p>Let’s walk through this code step-by-step to understand how we’re going to train our text classifier.</p>



<h3 class="wp-block-heading">The model architecture</h3>



<p><code>MulticlassClassificationModel</code> is a simple two-layer feedforward neural network. It takes a BoW vector as input, with each feature being a vocabulary word, and passes it through two sequential transformations to produce a prediction. The first layer (<code>fc1</code>) compresses this high-dimensional input down to a smaller intermediate representation, whose size we control via <code>hidden_layer_size</code>. A ReLU activation is then applied, which introduces a small amount of mathematical complexity that allows the model to learn patterns that a simple weighted sum couldn&#8217;t capture. The second layer (<code>fc2</code>) takes this intermediate representation and maps it down to four output values, one per news category, where the category with the highest value becomes the model&#8217;s prediction.</p>



<h3 class="wp-block-heading">Training and validation</h3>



<p><code>train_text_classification_model</code> handles the full training loop. It starts with a small amount of housekeeping: The AG News labels run from 1 to 4, but PyTorch expects 0-indexed classes, so these are shifted down by 1. The features and labels are then converted to PyTorch tensors, and a <code>DataLoader</code> is created to feed the training data to the model in batches.</p>



<p>Each epoch, the model processes the training data batch by batch. For each batch, it runs a forward pass to generate predictions, computes the cross-entropy loss against the true labels, and then runs a backward pass to update the model weights via the RMSprop optimizer. At the end of every epoch, the model switches into evaluation mode and runs inference over the full validation set, printing the training and validation loss and accuracy so we can monitor how training is progressing.</p>



<h3 class="wp-block-heading">Generating predictions</h3>



<p>Once training is complete, <code>generate_predictions</code> runs the trained model on a held-out dataset and returns the predicted class for each article. It also prints a confusion matrix, which gives us a breakdown of which categories the model is getting right and where it&#8217;s getting confused, which is a much more informative picture than accuracy alone.</p>



<h3 class="wp-block-heading">Running the baseline</h3>



<p>We can now train the baseline model. We pass in the raw count-vectorized training and validation features, specify an input size equal to the vocabulary size (59,544 columns), train for two epochs, and use a hidden layer of 5,000 nodes.</p>



<pre class="EnlighterJSRAW" data-enlighter-language="python" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">baseline_model = train_text_classification_model(
    ag_news_train_cv,
    ag_news_train["label"].to_numpy(),
    ag_news_val_cv,
    ag_news_val["label"].to_numpy(),
    ag_news_train_cv.shape[1],
    5,
    5000
)

predictions = generate_predictions(
    baseline_model,
    ag_news_val_cv,
    ag_news_val["label"].to_numpy()
)</pre>



<pre class="EnlighterJSRAW" data-enlighter-language="raw" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">Epoch [1/2], Train Loss: 0.3553, Train Acc: 0.8813, Val Loss: 0.2307, Val Acc: 0.9243
Epoch [2/2], Train Loss: 0.1217, Train Acc: 0.9587, Val Loss: 0.2352, Val Acc: 0.9240

Confusion Matrix:
Predicted     1     2     3     4
Actual                           
1          2774    65    89    72
2            37  2944     9    10
3           112    20  2694   174
4            97    20   207  2676</pre>



<p>Even with the very basic data preparation we did, we can see we’ve performed very well on this prediction task, with around 92% accuracy. The confusion matrix shows that the model seems to have the easiest time distinguishing between category two (sports) and the other topics, and the hardest time distinguishing between category three (business) and category four (science/technology). This makes sense, as the words used to describe sports are very distinct and unlikely to be used in other contexts (things like football), whereas there is likely to be overlapping vocabulary between business and technology (especially company names).</p>



<p>As we saw above, there is a lot we can do to improve the signal-to-noise ratio in BoW modeling. Let’s apply four commonly used techniques to our data and see whether this improves our predictions: lemmatization, stop word removal, limiting our vocabulary to the top N terms, and TF-IDF weighting. As you’ll see, all of these can be done relatively simply using inbuilt functions in packages such as spaCy and scikit-learn.</p>



<h3 class="wp-block-heading">Lemmatization</h3>



<p>As we discussed earlier, lemmatization collapses inflected word forms into a single vocabulary entry by mapping each word to its dictionary base form, which both shrinks the vocabulary and concentrates the signal for each concept into a single feature. We&#8217;ll use spaCy for this, which first requires downloading its small English language model:</p>



<pre class="EnlighterJSRAW" data-enlighter-language="python" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">!python -m spacy download en_core_web_sm

nlp = spacy.load("en_core_web_sm")</pre>



<p>Our <code>lemmatise_text</code> function passes each text through spaCy&#8217;s NLP pipeline using <code>nlp.pipe()</code>, which processes them in batches of 1,000 for efficiency. For each document, it extracts the <code>.lemma_</code> attribute of every token and joins them back into a single string. One small detail worth noting: we preserve the original DataFrame index when constructing the output Series, so that rows stay correctly aligned when we assign the results back.</p>



<p>We apply lemmatization before string cleaning, since spaCy needs the original casing and punctuation to correctly identify grammatical structure. For example, &#8220;running&#8221; and &#8220;Running&#8221; lemmatize to the same thing, but removing punctuation first can confuse the parser. Once lemmatized, we pass the output through <code>apply_string_cleaning</code> as before:</p>



<pre class="EnlighterJSRAW" data-enlighter-language="python" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">ag_news_train["title_clean"] = apply_string_cleaning(lemmatise_text(ag_news_train["title"]))
ag_news_train["description_clean"] = apply_string_cleaning(lemmatise_text(ag_news_train["description"]))

ag_news_val["title_clean"] = apply_string_cleaning(lemmatise_text(ag_news_val["title"]))
ag_news_val["description_clean"] = apply_string_cleaning(lemmatise_text(ag_news_val["description"]))

ag_news_train["text_clean"] = ag_news_train["title_clean"] + " " + ag_news_train["description_clean"]

ag_news_val["text_clean"] = ag_news_val["title_clean"] + " " + ag_news_val["description_clean"]</pre>



<p>We apply this separately to the title and description columns before concatenating them into a single <code>text_clean</code> field. As you can see, we do this for both the training and validation sets using the same function, so that lemmatization is applied consistently across both splits.</p>



<figure class="wp-block-image size-full"><img style="width:100% !important; height:auto !important; max-width:100% !important;" decoding="async" loading="lazy" src="https://blog.jetbrains.com/wp-content/uploads/2026/04/screenshot-10-lemmatisation.png" alt="" class="wp-image-703698"/></figure>



<h3 class="wp-block-heading">Removing stop words</h3>



<p>As with lemmatization, we covered the motivation for stop word removal earlier: Words like &#8220;the&#8221;, &#8220;is&#8221;, and &#8220;of&#8221; appear so frequently across all texts that they add noise rather than signal to our feature matrix. Here we&#8217;ll actually apply it to our data.</p>



<pre class="EnlighterJSRAW" data-enlighter-language="python" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">def remove_stopwords(texts: pd.Series) -> pd.Series:
   texts = texts.fillna("").astype(str)

   filtered_texts = []
   for doc in nlp.pipe(texts, batch_size=1000):
       filtered_texts.append(
           " ".join(token.text for token in doc if not token.is_stop)
       )

   return pd.Series(filtered_texts, index=texts.index)</pre>



<p>Our <code>remove_stopwords</code> function again uses <code>nlp.pipe()</code> to process texts in batches. For each document, it filters out any token where spaCy&#8217;s <code>is_stop</code> attribute is True, and joins the remaining tokens back into a string. Conveniently, spaCy handles stop word detection using the same pipeline we already loaded for lemmatization, so no additional setup is needed.</p>



<p>We apply this to the already-cleaned and lemmatized <code>text_clean</code> column for both the training and validation sets, so the stop word removal builds directly on our previous preprocessing steps and is applied consistently across both splits.</p>



<pre class="EnlighterJSRAW" data-enlighter-language="python" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">ag_news_train["text_no_stopwords"] = remove_stopwords(ag_news_train["text_clean"])
ag_news_val["text_no_stopwords"] = remove_stopwords(ag_news_val["text_clean"])</pre>



<h3 class="wp-block-heading">Top N terms and TF-IDF vectorization</h3>



<p>The final two improvements we&#8217;ll apply are limiting the vocabulary size and switching from raw count vectorization to TF-IDF weighting. Conveniently, scikit-learn&#8217;s <code>TfidfVectorizer</code> handles both in a single step.</p>



<p>Recall from earlier that TF-IDF downweights words that appear frequently across many documents while upweighting words that are distinctive to particular documents. This cleans up uninformative words that don’t quite qualify as stopwords, but add little useful information to our dataset. The <code>max_features=20000</code> argument caps the vocabulary at the 20,000 most frequent terms after TF-IDF scoring, which discards the long tail of rare words that are unlikely to generalize well and brings our feature matrix down to a much more manageable size. (The choice of 20,000 words is arbitrary. We could have easily used a smaller or larger number, depending on our dataset and use case.)</p>



<p>As with <code>CountVectorizer</code>, we fit only on the training data and then use that fixed vocabulary to transform both the training and validation sets:</p>



<pre class="EnlighterJSRAW" data-enlighter-language="python" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">TfidfVectorizerNews = TfidfVectorizer(max_features=20000)
TfidfVectorizerNews.fit(ag_news_train["text_no_stopwords"])

ag_news_train_tfidf = TfidfVectorizerNews.transform(ag_news_train["text_no_stopwords"]).toarray()
ag_news_val_tfidf = TfidfVectorizerNews.transform(ag_news_val["text_no_stopwords"]).toarray()</pre>



<p>We can inspect the resulting vocabulary and feature matrix exactly as we did before:</p>



<pre class="EnlighterJSRAW" data-enlighter-language="python" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">TfidfVectorizerNews.vocabulary_</pre>



<pre class="EnlighterJSRAW" data-enlighter-language="raw" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">{'fed': np.int64(6243),
 'pension': np.int64(13134),
 'default': np.int64(4469),
 'cite': np.int64(3200),
 'failure': np.int64(6109),
 'big': np.int64(1787),
 'airline': np.int64(401),
 'payment': np.int64(13051),
 'plan': np.int64(13424),
 'government': np.int64(7306),
 'official': np.int64(12453),
 'tuesday': np.int64(18437),
 'congress': np.int64(3691),
 'hard': np.int64(7689),
 'corporation': np.int64(3901),
...}</pre>



<pre class="EnlighterJSRAW" data-enlighter-language="python" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">pd.DataFrame(ag_news_train_tfidf, columns=TfidfVectorizerNews.get_feature_names_out())</pre>



<figure class="wp-block-image size-full"><img style="width:100% !important; height:auto !important; max-width:100% !important;" decoding="async" loading="lazy" src="https://blog.jetbrains.com/wp-content/uploads/2026/04/screenshot-12-tf-idf-matrix.png" alt="" class="wp-image-703905"/></figure>



<p>Compared to our baseline feature matrix of 59,544 columns filled almost entirely with zeros, this is considerably leaner. We now have 20,000 columns of weighted scores that better reflect each word&#8217;s actual importance to the document it appears in. It is still relatively sparse, but we can see from both the feature matrix and the vocabulary list that it is much more focused on semantically rich words.</p>



<h3 class="wp-block-heading">Fitting the revised model</h3>



<p>With our improved features in hand, we can now retrain the model. The call is identical to before, except we pass in the TF-IDF feature matrices instead of the raw count vectors, and the input size is now 20,000 rather than 59,544:</p>



<pre class="EnlighterJSRAW" data-enlighter-language="python" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">baseline_model = train_text_classification_model(
    ag_news_train_tfidf,
    ag_news_train["label"].to_numpy(),
    ag_news_val_tfidf,
    ag_news_val["label"].to_numpy(),
    ag_news_train_tfidf.shape[1],
    2,
    5000
)

predictions = generate_predictions(
    baseline_model,
    ag_news_val_tfidf,
    ag_news_val["label"].to_numpy()
)</pre>



<pre class="EnlighterJSRAW" data-enlighter-language="raw" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">Epoch [1/2], Train Loss: 0.3183, Train Acc: 0.8932, Val Loss: 0.2301, Val Acc: 0.9225
Epoch [2/2], Train Loss: 0.1512, Train Acc: 0.9475, Val Loss: 0.2332, Val Acc: 0.9243
Confusion Matrix - Raw Counts:
Predicted     1     2     3     4
Actual                           
1          2703    71   121   105
2            20  2955    13    12
3            68    19  2691   222
4            77    17   163  2743</pre>



<p>The results are actually very encouraging! Our overall validation accuracy is essentially unchanged at around 92%, but we&#8217;ve achieved this with a feature matrix that is less than a third of the size. This suggests that the extra vocabulary in the baseline (including the stop words) was contributing to noise rather than signal. Reducing the size of the feature matrix makes our model more stable, less prone to overfitting, and much more manageable to deploy.</p>



<p>Looking at the confusion matrix, the pattern of errors is similar to before: Sports (category two) is the easiest category to classify, with 98.5% accuracy, while Business (category three) and Science/Technology (category four) remain the hardest to separate, with around 7% of articles in each category being misclassified as the other. This is consistent with what we saw in the baseline, so it seems that the preprocessing improvements have tightened things up at the margins, but the fundamental difficulty of the Business/Technology boundary is a property of the data rather than the feature representation.</p>



<h3 class="wp-block-heading">Applying our model to the test set</h3>



<p>Finally, we need to validate that our model performs as well on the test set as it does on the validation set. Up to this point, we&#8217;ve deliberately kept the test set locked away. As mentioned earlier, if we had been making modeling decisions based on test set performance, we&#8217;d risk inadvertently overfitting our choices to it, and our final accuracy estimate would be optimistic.</p>



<p>The preprocessing steps must be applied in exactly the same order as for the training and validation data: lemmatization, string cleaning, concatenation of title and description, and stop-word removal. Crucially, we also call <code>.transform()</code> rather than <code>.fit_transform()</code> on the test text, using the vocabulary learned from the training data:</p>



<pre class="EnlighterJSRAW" data-enlighter-language="python" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">ag_news_test["title_clean"] = apply_string_cleaning(lemmatise_text(ag_news_test["title"]))
ag_news_test["description_clean"] = apply_string_cleaning(lemmatise_text(ag_news_test["description"]))
ag_news_test["text_clean"] = ag_news_test["title_clean"] + " " + ag_news_test["description_clean"]
ag_news_test["text_no_stopwords"] = remove_stopwords(ag_news_test["text_clean"])

ag_news_test_tfidf = TfidfVectorizerNews.transform(ag_news_test["text_no_stopwords"]).toarray()</pre>



<p>We can then generate predictions and evaluate accuracy on the test set:</p>



<pre class="EnlighterJSRAW" data-enlighter-language="python" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">test_predictions = generate_predictions(
    baseline_model,
    ag_news_test_tfidf,
    ag_news_test["label"].to_numpy()
)

test_accuracy = accuracy_score(ag_news_test["label"].to_numpy(), test_predictions)
print(f"Test Accuracy: {test_accuracy:.4f}")</pre>



<pre class="EnlighterJSRAW" data-enlighter-language="raw" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">Test Accuracy: 0.9183

Confusion Matrix - Raw Counts:
Predicted     1     2     3     4
Actual                           
1          1710    54    78    58
2            13  1870    10     7
3            51    12  1676   161
4            53     9   115  1723</pre>



<p>The test accuracy of 91.8% is very close to the 92.4% we saw on the validation set, which is a reassuring sign that our model has generalized well rather than overfitting to the validation data. The confusion matrix tells the same story as before: Sports (category two) remains the easiest category to classify, with only 30 misclassified articles out of 1,900, while the Business/Technology boundary continues to be the main source of errors, with around 8% of articles in each category being misclassified as the other. The consistency between validation and test results gives us confidence that these error patterns reflect genuine properties of the data rather than artifacts of any particular split.</p>



<h2 class="wp-block-heading">Limitations and alternatives</h2>



<h3 class="wp-block-heading">Loses word order information</h3>



<p>The most fundamental limitation of the bag-of-words model is right there in the name: it treats text as an unordered collection of words, discarding all sequence information. This means &#8220;the dog bit the man&#8221; and &#8220;the man bit the dog&#8221; produce identical vectors, even though they describe very different events. For many classification tasks, this doesn&#8217;t matter much, but for tasks that require understanding the relationship between words, such as question answering or natural language inference, the loss of word order is a serious handicap.</p>



<h3 class="wp-block-heading">Ignores semantics and context</h3>



<p>BoW has no notion of word meaning or context. Each word is simply a column in a matrix, entirely independent of every other word. This creates two related problems. First, synonyms are treated as completely distinct features: &#8220;cheap&#8221; and &#8220;inexpensive&#8221; contribute nothing to each other&#8217;s signal, even though they mean the same thing. Second, words with multiple meanings are treated as a single feature regardless of context: &#8220;bank&#8221; means the same thing whether it appears in a sentence about rivers or finance. Both of these issues limit how well BoW representations can capture the actual semantics of a text.</p>



<h3 class="wp-block-heading">Can result in large, sparse vectors</h3>



<p>As we saw in our own example, even a moderately sized corpus of news headlines can produce a vocabulary of nearly 60,000 unique terms. The resulting feature matrix has one column per vocabulary word, but any individual document only uses a tiny fraction of them, leaving the vast majority of values at zero. This sparsity creates two practical problems: The matrices can consume a large amount of memory if stored densely, and the high dimensionality can make it harder for models to find meaningful patterns, a phenomenon sometimes called the curse of dimensionality.</p>



<h3 class="wp-block-heading">Alternatives</h3>



<p>If BoW&#8217;s limitations are a bottleneck for your task, there are several well-established alternatives worth considering.</p>



<ul>
<li><strong>Word embeddings (Word2Vec and GloVe)</strong> address the semantics problem by representing each word as a dense vector in a continuous space, where similar words are geometrically close to each other. They capture distributional meaning far more richly than BoW, and are a natural next step when synonym handling or word similarity matters. Doc2Vec extends this idea to produce embeddings for entire documents rather than individual words.</li>



<li><strong>Transformer-based models (BERT and GPT)</strong> go further still, generating contextual representations where the same word receives a different vector depending on the surrounding text. This handles polysemy directly and captures complex long-range dependencies between words. The trade-off is substantially higher computational cost and complexity compared to BoW.</li>



<li><strong>Topic models like latent Dirichlet allocation (LDA)</strong> take a different angle entirely. Rather than encoding documents for downstream classification, they are generative models that discover latent thematic structure in a corpus. This is useful when your goal is exploration and interpretation rather than prediction.</li>
</ul>



<p>For tasks where BoW already performs well, as we saw here with AG News, the added complexity of these approaches may not be worth the cost. BoW remains a strong baseline, and it&#8217;s always worth establishing how far it can take you before reaching for heavier machinery.</p>



<h2 class="wp-block-heading">Get started with PyCharm today</h2>



<p>In this post, we&#8217;ve covered a lot of ground: from the fundamentals of the bag-of-words model and how it converts text into numerical vectors, through to building and iteratively improving a real text classification pipeline on the AG News dataset. Along the way, we&#8217;ve seen how preprocessing steps like lemmatization, stop word removal, vocabulary capping, and TF-IDF weighting can meaningfully improve the efficiency of your feature representation, and how PyCharm&#8217;s DataFrame viewer, column statistics, chart view, and AI Assistant make each of these steps faster and easier to inspect and debug.</p>



<p>If you&#8217;d like to try this yourself, <a href="https://www.jetbrains.com/pycharm/download/?section=windows" target="_blank" rel="noopener">PyCharm Pro</a> comes with a 30-day trial. As we saw in this tutorial, its built-in support for Jupyter notebooks, virtual environments, and scientific libraries means you can go from a blank project to a working NLP pipeline with minimal setup friction, leaving you free to focus on the fun parts.&nbsp;</p>



<p>You can find the <a href="https://github.com/t-redactyl/ag-news-bag-of-words-classification" target="_blank" rel="noopener">full code</a> for this project on GitHub. If you&#8217;re interested in exploring more NLP topics, check out our recent blogs <a href="https://blog.jetbrains.com/pycharm/">here</a>.</p>
]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>PyCharm for Django Fundraiser: Why Django Matters in the AI Era – And Why We’re Supporting It</title>
		<link>https://blog.jetbrains.com/pycharm/2026/04/pycharm-for-django-fundraiser-why-django-matters-in-the-ai-era-and-why-we-re-supporting-it/</link>
		
		<dc:creator><![CDATA[Valeria Letusheva]]></dc:creator>
		<pubDate>Wed, 22 Apr 2026 12:15:34 +0000</pubDate>
		<featuredImage>https://blog.jetbrains.com/wp-content/uploads/2026/04/Blog_1280x720-4.png</featuredImage>		<category><![CDATA[pycharm]]></category>
		<category><![CDATA[django]]></category>
		<category><![CDATA[oss]]></category>
		<guid isPermaLink="false">https://blog.jetbrains.com/?post_type=pycharm&#038;p=702538</guid>

					<description><![CDATA[Spend a few minutes around developer content, and it’s easy to come away with the impression that web apps now appear to almost write themselves. Everything that follows – review, verification, refactoring, debugging, and the open-source frameworks that make those apps dependable – gets less attention. AI can speed up code generation, but it does [&#8230;]]]></description>
										<content:encoded><![CDATA[
<p>Spend a few minutes around developer content, and it’s easy to come away with the impression that web apps now appear to almost write themselves.</p>



<p>Everything that follows – review, verification, refactoring, debugging, and the open-source frameworks that make those apps dependable – gets less attention. AI can speed up code generation, but it does not remove the need for stable foundations. A lot of AI-generated code works because it’s built on top of mature open-source frameworks, libraries, and documentation.&nbsp;</p>



<blockquote class="wp-block-quote">
<p>AI can scaffold a web app in thirty seconds. Django is what keeps it running for ten years. That gap is only getting more valuable.</p>
<cite>Will Vincent, former Django Board Member, co-host of the Django Chat podcast and co-writer of the weekly Django News newsletter</cite></blockquote>



<p>As AI makes OSS easier to consume, it can also make the work behind it easier to overlook. But OSS still needs support – perhaps more than ever.</p>



<h2 class="wp-block-heading">PyCharm for Django Fundraiser</h2>



<figure class="wp-block-image size-full"><img style="width:100% !important; height:auto !important; max-width:100% !important;" decoding="async" loading="lazy" src="https://blog.jetbrains.com/wp-content/uploads/2026/04/Blog_1280x720-4.png" alt="" class="wp-image-702548"/></figure>



<p>PyCharm has supported Django through fundraising campaigns and ongoing collaboration with the Django Software Foundation (DSF). This year, we’re doing it again.</p>



<p>Together with the Django community, this campaign raised $350,000 for Django from 2016 to 2025. That support helps keep Django secure, stable, relevant, and sustainable, while also supporting community programs such as Django Girls and official events. Previous PyCharm fundraisers accounted for approximately 25% of the DSF budget, according to Django’s official blog.</p>



<blockquote class="wp-block-quote">
<p>Django is the rare framework that rewards you the longer you use it: mature, dependable, and still innovating. Best-in-class software, matched by one of the most welcoming communities in open source.</p>
<cite>Will Vincent, former Django Board Member, co-host of the Django Chat podcast and co-writer of the weekly Django News newsletter</cite></blockquote>



<p>If Django has helped you learn, ship, or maintain real web products, this is a direct way to give back.</p>



<p>You can donate to the Django Software Foundation <a href="https://www.djangoproject.com/fundraising/" target="_blank" rel="noopener">directly</a>, or you can support Django through this fundraiser and get a tool you&#8217;ll rely on every day.</p>



<blockquote class="wp-block-quote">
<p>Django&#8217;s ‘batteries included’ philosophy was built for humans who wanted to ship fast. Turns out it&#8217;s perfect for AI agents too — fewer decisions, fewer dependencies, and fewer ways to go wrong. </p>
<cite>Will Vincent, former Django Board Member, co-host of the Django Chat podcast and co-writer of the weekly Django News newsletter</cite></blockquote>



<h2 class="wp-block-heading">The offer</h2>



<p>During this campaign, get 30% off PyCharm Pro, with 100% of the proceeds going to the DSF. Or you can bundle PyCharm Pro with the JetBrains AI Pro plan and get 40% off PyCharm Pro.</p>



<p>This campaign ends in less than two weeks, so act now!</p>


    <div class="buttons">
        <div class="buttons__row">
                                                <a href="https://www.jetbrains.com/pycharm/promo/support-django/" class="btn " target="_blank" rel="noopener">Get the offer</a>
                                    </div>
    </div>







<h2 class="wp-block-heading">Why PyCharm Pro</h2>



<h3 class="wp-block-heading">Perfect for your workflow</h3>



<p>The hard part of modern development is often not writing code from scratch – it’s understanding the whole project well enough to change it safely.</p>



<p>That’s where PyCharm Pro proves its value:</p>



<ul>
<li>Navigate and refactor across your entire Django project, from templates to databases.</li>



<li>Work with databases without leaving the IDE.</li>



<li><a href="https://www.jetbrains.com/pycharm/web-development/django/" target="_blank" rel="noopener">Build and debug Django templates</a> with full awareness of your context.&nbsp;</li>



<li>Develop frontend code with built-in support for JavaScript, TypeScript, and major frameworks.</li>



<li>Run and debug remote and Docker-based environments with ease</li>
</ul>



<blockquote class="wp-block-quote">
<p>No editor understands Django like PyCharm does — from template tags to ORM queries to migrations, it sees the whole stack the way you do. </p>
<cite>Will Vincent, former Django Board Member, co-host of the Django Chat podcast and co-writer of the weekly Django News newsletter</cite></blockquote>



<blockquote class="wp-block-quote">
<p>For Django work, I think PyCharm is one of the best tools available. I use it every day. If you haven’t given it a try, this campaign is a great opportunity – AND it supports the Django Software Foundation! </p>
<cite>Sarah Boyce, Django Fellow and Djangonaut Space co-organizer</cite></blockquote>



<h3 class="wp-block-heading">AI on your terms</h3>



<p>If you want AI in PyCharm, you can start with <a href="https://www.jetbrains.com/ai-ides/" target="_blank" rel="noopener">JetBrains AI</a> directly in the IDE.<strong> </strong>You can also shape it to fit your workflow. Bring your own key, sign in with a supported provider, use third-party or local models, or connect compatible agents such as Claude Code and Codex via ACP.</p>



<p>That gives you more control over how you work with AI, instead of locking you into a single workflow, model, or provider. And if AI isn’t what you need, you can simply turn it off.</p>



<h3 class="wp-block-heading">Support the framework you use every day</h3>



<p>If Django is part of how you build, this purchase can improve your workflow while also investing in the framework behind it.</p>


    <div class="buttons">
        <div class="buttons__row">
                                                <a href="https://www.jetbrains.com/pycharm/promo/support-django/" class="btn " target="_blank" rel="noopener">Get the offer</a>
                                    </div>
    </div>







<p>Happy coding!</p>
]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>How (Not) to Learn Python</title>
		<link>https://blog.jetbrains.com/pycharm/2026/04/how-not-to-learn-python/</link>
		
		<dc:creator><![CDATA[Cheuk Ting Ho]]></dc:creator>
		<pubDate>Fri, 10 Apr 2026 14:21:21 +0000</pubDate>
		<featuredImage>https://blog.jetbrains.com/wp-content/uploads/2026/04/PC-social-BlogFeatured-1280x720-1.png</featuredImage>		<guid isPermaLink="false">https://blog.jetbrains.com/?post_type=pycharm&#038;p=696301</guid>

					<description><![CDATA[While listening to Mark Smith’s inspirational talk for Python Unplugged on PyTV about How to Learn Python, what caught my attention was that Mark suggested turning off some of PyCharm’s AI features to help you learn Python more effectively. As a PyCharm user myself, I’ve found the AI-powered features beneficial in my day-to-day work; however, [&#8230;]]]></description>
										<content:encoded><![CDATA[
<p>While listening to <a href="https://www.youtube.com/watch?v=i6SdsSj96ys" target="_blank" rel="noopener">Mark Smith’s inspirational talk</a> for <em>Python Unplugged on PyTV </em>about <em>How to Learn Python</em>, what caught my attention was that Mark suggested turning off some of PyCharm’s AI features to help you learn Python more effectively.</p>



<figure class="wp-block-image size-full"><img style="width:100% !important; height:auto !important; max-width:100% !important;" decoding="async" loading="lazy" src="https://blog.jetbrains.com/wp-content/uploads/2026/04/image-2.png" alt="" class="wp-image-696367"/></figure>



<p>As a PyCharm user myself, I’ve found the AI-powered features beneficial in my day-to-day work; however, I never considered that I could turn certain features on or off to customize my experience. This can be done from the settings menu under <em>Editor</em> | <em>General</em> | <em>Code Completion</em> | <em>Inline</em>.</p>



<figure class="wp-block-image size-full"><img style="width:100% !important; height:auto !important; max-width:100% !important;" decoding="async" loading="lazy" src="https://blog.jetbrains.com/wp-content/uploads/2026/04/image1-1.png" alt="" class="wp-image-700369"/></figure>



<p>While we are at it, let’s have a look at these features and investigate in more detail why they are great for professional developers but may not be ideal for learners.</p>



<h2 class="wp-block-heading">Local full line code completion suggestions</h2>



<p>JetBrains AI credits are not consumed when you use local line completion. The completion prediction is performed using a built-in local deep learning model. To use this feature, make sure the box for <em>Enable inline completion using language models</em> is checked, and choose either <em>Local</em> or <em>Cloud and local</em> in the options. To show the complete results using the local model alone, we will look at the predictions when only <em>Local</em> is selected.</p>



<figure class="wp-block-image size-full"><img style="width:100% !important; height:auto !important; max-width:100% !important;" decoding="async" loading="lazy" src="https://blog.jetbrains.com/wp-content/uploads/2026/04/image2-1.png" alt="" class="wp-image-700380"/></figure>



<p>When it’s selected, you see that the only code completion available out of the box in PyCharm is for Python. To make suggestions available for CSS or HTML, you need to download additional models.</p>



<figure class="wp-block-image size-full"><img style="width:100% !important; height:auto !important; max-width:100% !important;" decoding="async" loading="lazy" src="https://blog.jetbrains.com/wp-content/uploads/2026/04/image3-1.png" alt="" class="wp-image-700391"/></figure>



<p>When you are writing code, you will see suggestions pop up in grey with a hint for you to use <em>Tab</em> to complete the line.&nbsp;</p>



<figure class="wp-block-image size-full"><img style="width:100% !important; height:auto !important; max-width:100% !important;" decoding="async" loading="lazy" src="https://blog.jetbrains.com/wp-content/uploads/2026/04/image4.png" alt="" class="wp-image-700402"/></figure>



<p>After completing that line, you can press <em>Enter</em> to go to the next one, where there may be a new suggestion that you can again use <em>Tab</em> to complete. As you see, this can be very convenient for developers in their daily coding, as it saves time that would otherwise be spent typing obvious lines of code that follow the flow naturally.&nbsp;</p>



<p>However, for beginners, mindlessly hitting <em>Tab</em> and letting the model complete lines may discourage them from learning how to use the functions correctly. An alternative is to use the hint provided by PyCharm to help you choose an appropriate method from the available list, determine which parameters are needed, check the documentation if necessary, and write the code yourself. Here is what the hint looks like when code completion is turned off:</p>



<figure class="wp-block-image size-full"><img style="width:100% !important; height:auto !important; max-width:100% !important;" decoding="async" loading="lazy" src="https://blog.jetbrains.com/wp-content/uploads/2026/04/image5.png" alt="" class="wp-image-700413"/></figure>



<h2 class="wp-block-heading">Cloud-based completion suggestions</h2>



<p>Let’s have a look at cloud-based completion in contrast to local completion. When using cloud-based completion, next-edit suggestions are also available (which we will look at in more detail in the next section).</p>



<figure class="wp-block-image size-full"><img style="width:100% !important; height:auto !important; max-width:100% !important;" decoding="async" loading="lazy" src="https://blog.jetbrains.com/wp-content/uploads/2026/04/image6.png" alt="" class="wp-image-700424"/></figure>



<p> Cloud-based completion comes with support for multiple languages by default, and you can switch it on or off for each language individually.</p>



<figure class="wp-block-image size-full"><img style="width:100% !important; height:auto !important; max-width:100% !important;" decoding="async" src="https://blog.jetbrains.com/wp-content/uploads/2026/04/image-1.png" alt="" class="wp-image-696348"/></figure>



<p>Cloud-based completion provides more functionality than local model completion, but you need a JetBrains AI subscription to use it.</p>



<figure class="wp-block-image size-full"><img style="width:100% !important; height:auto !important; max-width:100% !important;" decoding="async" loading="lazy" src="https://blog.jetbrains.com/wp-content/uploads/2026/04/image8.png" alt="" class="wp-image-700446"/></figure>



<p>You may also connect to a third-party AI provider for your cloud-based completion. Since this support is still in Beta in PyCharm 2026.1, it is highly recommended to keep your JetBrains AI subscription active as a backup to ensure all features are available.</p>



<p>After switching to cloud-based completion, one of the differences I noticed was that it is better at multiple-line completion, which can be more convenient. However, I have also encountered situations where the completion provided too much for me, and I had to jump in to make my own modifications after accepting the suggestions.</p>



<figure class="wp-block-image size-full"><img style="width:100% !important; height:auto !important; max-width:100% !important;" decoding="async" loading="lazy" src="https://blog.jetbrains.com/wp-content/uploads/2026/04/image9.png" alt="" class="wp-image-700457"/></figure>



<p>For learners of Python, again, you may want to disable this functionality or have to audit all the suggestions in detail yourself. In addition to the danger of relying too heavily on code completion, which removes opportunities to learn, cloud code completion poses another risk for learners. Because larger suggestions require active review from the developer, learners may not be equipped to fully audit the wholesale suggestions they are accepting. Disabling this feature for learners not only encourages learning, but it can also help prevent mistakes.</p>



<h2 class="wp-block-heading">Next edit suggestions</h2>



<p>In addition to cloud-based completion, JetBrains AI Pro, Ultimate, and Enterprise users are able to take advantage of next edit suggestions.</p>



<figure class="wp-block-image size-full"><img style="width:100% !important; height:auto !important; max-width:100% !important;" decoding="async" loading="lazy" src="https://blog.jetbrains.com/wp-content/uploads/2026/04/image10.png" alt="" class="wp-image-700468"/></figure>



<p>When they are enabled, every time you make changes to your code, for example, renaming a variable, you will be given suggestions about other places that need to be changed.</p>



<figure class="wp-block-image size-full"><img style="width:100% !important; height:auto !important; max-width:100% !important;" decoding="async" loading="lazy" src="https://blog.jetbrains.com/wp-content/uploads/2026/04/nes.gif" alt="" class="wp-image-700347"/></figure>



<p>And when you press <em>Tab</em>, the changes will be made automatically. You can also customize this behavior so you can see previews of the changes and jump continuously to the next edit until no more are suggested.</p>



<p>This is, no doubt, a very handy feature. It can help you avoid some careless mistakes, like forgetting to refactor your code when you make changes. However, for learners, thinking about what needs to be done is a valuable thought exercise, and using this feature can deprive them of some good learning opportunities.</p>



<h2 class="wp-block-heading">Conclusion</h2>



<p>PyCharm offers a lot of useful features to smooth out your day-to-day development workflow. However, these features may be too powerful, and even too convenient, for those who have just started working with Python and need to learn by making mistakes. It is good to use AI features to improve our work, but we also need to double-check the results and make sure that we want what the AI suggests.</p>



<p>To learn more about how to level up your Python skills, I highly recommend watching <a href="https://www.youtube.com/watch?v=i6SdsSj96ys" target="_blank" rel="noopener">Mark’s talk on PyTV</a> and checking out all the <a href="https://www.jetbrains.com/help/ai-assistant/getting-started-with-ai-assistant.html" target="_blank" rel="noopener">AI features</a> that JetBrains AI has to offer. I hope you will find the perfect way to integrate them into your work while remaining ready to turn them off when you plan to learn something new.</p>
]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>How to Train Your First TensorFlow Model in PyCharm</title>
		<link>https://blog.jetbrains.com/pycharm/2026/04/how-to-train-your-first-tensorflow-model/</link>
		
		<dc:creator><![CDATA[Evgenia Verbina]]></dc:creator>
		<pubDate>Tue, 07 Apr 2026 10:36:35 +0000</pubDate>
		<featuredImage>https://blog.jetbrains.com/wp-content/uploads/2026/04/PC-social-BlogSocialShare-1280x720-1-1.png</featuredImage>		<category><![CDATA[data-science]]></category>
		<category><![CDATA[tutorials]]></category>
		<category><![CDATA[tensorflow]]></category>
		<category><![CDATA[tensors]]></category>
		<guid isPermaLink="false">https://blog.jetbrains.com/?post_type=pycharm&#038;p=697464</guid>

					<description><![CDATA[This is a guest post from Iulia Feroli, founder of the Back To Engineering community on YouTube. TensorFlow is a powerful open-source framework for building machine learning and deep learning systems. At its core, it works with tensors (a.k.a multi‑dimensional arrays) and provides high‑level libraries (like Keras) that make it easy to transform raw data [&#8230;]]]></description>
										<content:encoded><![CDATA[
<p><em>This is a guest post from </em><strong><em><a href="https://blog.jetbrains.com/pycharm/2026/04/how-to-train-your-first-tensorflow-model/#author" data-type="link" data-id="https://blog.jetbrains.com/pycharm/2026/04/how-to-train-your-first-tensorflow-model/#author">Iulia Feroli</a></em></strong><em>, founder of the Back To Engineering community on YouTube.</em></p>



<figure class="wp-block-image size-full"><img style="width:100% !important; height:auto !important; max-width:100% !important;" decoding="async" loading="lazy" src="https://blog.jetbrains.com/wp-content/uploads/2026/04/PC-social-BlogFeatured-1280x720-1-1.png" alt="How to Train Your First TensorFlow Model in PyCharm" class="wp-image-697465"/></figure>



<p><a href="https://www.tensorflow.org/" target="_blank" rel="noopener">TensorFlow</a> is a powerful open-source framework for building machine learning and deep learning systems. At its core, it works with tensors (a.k.a multi‑dimensional arrays) and provides high‑level libraries (like Keras) that make it easy to transform raw data into models you can train, evaluate, and deploy.</p>



<p>TensorFlow helps you handle the full pipeline: loading and preprocessing data, assembling models from layers and activations, training with optimizers and loss functions, and exporting for serving or even running on edge devices (including lightweight TensorFlow Lite models on Raspberry Pi and other microcontrollers).&nbsp;</p>



<p>If you want to make data-driven applications, prototyping neural networks, or ship models to production or devices, learning TensorFlow gives you a consistent, well-supported toolkit to go from idea to deployment.</p>



<p>If you’re brand new to TensorFlow, start by watching the <strong><a href="https://www.youtube.com/watch?v=hm07b8ETaso" data-type="link" data-id="https://www.youtube.com/watch?v=hm07b8ETaso" target="_blank" rel="noopener">short overview video</a></strong> where I explain tensors, neural networks, layers, and why TensorFlow is great for taking data → model → deployment, and how all of this can be explained with a LEGO-style pieces sorting example.&nbsp;</p>



<p>In this blog post, I’ll walk you through a first, stripped-down TensorFlow implementation notebook so we can get started with some practical experience. You can also watch the walkthrough video to follow along.</p>



<figure class="wp-block-embed is-type-video is-provider-youtube wp-block-embed-youtube wp-embed-aspect-16-9 wp-has-aspect-ratio"><div class="wp-block-embed__wrapper">
<iframe loading="lazy" title="Build Your First TensorFlow Model in Python (A Step-by-Step Tutorial)" src="https://www.youtube.com/embed/nswGrvOhaOY?feature=oembed" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen></iframe>
</div></figure>



<p>We&#8217;ll be exploring a very simple use case today: load the Fashion MNIST dataset, build two very simple Keras models, train and compare them, then dig into visualizations (predictions, confidence bars, confusion matrix). I kept the code minimal and readable so you can focus on the ideas – and you’ll see how <a href="https://www.jetbrains.com/pycharm/data-science/" target="_blank" rel="noopener">PyCharm</a> helps along the way.</p>



<h2 class="wp-block-heading">Training TensorFlow models step by step</h2>



<h3 class="wp-block-heading">Getting started in PyCharm</h3>



<p>We&#8217;ll be leveraging PyCharm&#8217;s native Notebook integration to build out <a href="https://github.com/iuliaferoli/TensorFlow_with_pycharm" target="_blank" rel="noopener">our project</a>. This way, we can inspect each step of the pipeline and use some supporting visualization along the way. We&#8217;ll <a href="https://www.jetbrains.com/help/pycharm/creating-empty-project.html" target="_blank" rel="noopener">create a new project</a> and <a href="https://www.jetbrains.com/help/pycharm/creating-virtual-environment.html" target="_blank" rel="noopener">generate a virtual environment</a> to manage our dependencies.&nbsp;</p>



<p>If you&#8217;re running the code from the attached repo, you can install directly from the requirements file. If you wish to expand this example with additional visualizations for further models, you can easily add more packages to your requirements as you go by using the PyCharm package manager helpers for <a href="https://www.jetbrains.com/guide/python/tips/install-and-import/)%20and" target="_blank" rel="noopener">installing</a> and <a href="https://www.jetbrains.com/help/pycharm/installing-uninstalling-and-upgrading-packages.html" target="_blank" rel="noopener">upgrading</a>.</p>



<h3 class="wp-block-heading">Load <code>Fashion MNIST</code> and inspect the data</h3>



<p><code>Fashion MNIST</code> is a great starter because the images are small (28×28 pixels), visually meaningful, and easy to interpret. They represent various garment types as pixelated black-and-white images, and provide the relevant labels for a well-contained classification task. We can first take a look at our data sample by printing some of these images with various matplotlib functions:</p>



<figure class="wp-block-image size-full"><img style="width:100% !important; height:auto !important; max-width:100% !important;" decoding="async" loading="lazy" src="https://blog.jetbrains.com/wp-content/uploads/2026/04/image1.png" alt="" class="wp-image-699830"/></figure>



<pre class="EnlighterJSRAW" data-enlighter-language="python" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">```
fig, axes = plt.subplots(2, 5, figsize=(10, 4))
for i, ax in enumerate(axes.flat):
    ax.imshow(x_train[i], cmap='gray')
    ax.set_title(class_names[y_train[i]])
    ax.axis('off')
plt.show()
```
# Two simple models (a quick experiment)
```
model1 = models.Sequential([
    layers.Flatten(input_shape=(28, 28)),
    layers.Dense(128, activation='relu'),
    layers.Dense(10, activation='softmax')
])
model2 = models.Sequential([
    layers.Flatten(input_shape=(28, 28)),
    layers.Dense(128, activation='relu'),
    layers.Dense(128, activation='relu'),
    layers.Dense(10, activation='softmax')
])
```</pre>



<h3 class="wp-block-heading">Compile and train your first model</h3>



<p>From here, we can compile and train our first TensorFlow model(s). With PyCharm’s code completion features and documentation access, you can get instant suggestions for building out these simple code blocks.</p>



<p>For a first try at TensorFlow, this allows us to spin up a working model with just a few presses of <em>Tab</em> in our IDE. We&#8217;re using the recommended standard optimizer and loss function, and we&#8217;re tracking for accuracy. We can choose to build multiple models by playing around with the number or type of layers, along with the other parameters.&nbsp;</p>



<pre class="EnlighterJSRAW" data-enlighter-language="python" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">```
model1.compile(
    optimizer='adam',
    loss='sparse_categorical_crossentropy',
    metrics=['accuracy']
)
model1.fit(x_train, y_train, epochs=10)
model2.compile(
    optimizer='adam',
    loss='sparse_categorical_crossentropy',
    metrics=['accuracy']
)
model2.fit(x_train, y_train, epochs=15)
```</pre>



<h3 class="wp-block-heading">Evaluate and compare your TensorFlow model performance</h3>



<pre class="EnlighterJSRAW" data-enlighter-language="python" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">```
loss1, accuracy1 = model1.evaluate(x_test, y_test)
print(f'Accuracy of model1: {accuracy1:.2f}')
loss2, accuracy2 = model2.evaluate(x_test, y_test)
print(f'Accuracy of model2: {accuracy2:.2f}')
```</pre>



<p>Once the models are trained (and you can see the epochs progressing visually as each cell is run), we can immediately evaluate the performance of the models.</p>



<p>In my experiment, <code>model1</code> sits around ~0.88 accuracy, and while <code>model2</code> is a little higher than that, it took 50% longer to train. That’s the kind of trade‑off you should be thinking about: Is a tiny accuracy gain worth the additional compute and complexity?&nbsp;</p>



<p>We can dive further into the results of the model run by generating a DataFrame instance of our new prediction dataset. Here we can also leverage built-in functions like `describe` to quickly get some initial statistical impressions:</p>



<figure class="wp-block-image size-full"><img style="width:100% !important; height:auto !important; max-width:100% !important;" decoding="async" loading="lazy" src="https://blog.jetbrains.com/wp-content/uploads/2026/04/image2.png" alt="" class="wp-image-699841"/></figure>



<pre class="EnlighterJSRAW" data-enlighter-language="python" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">```
predictions = model1.predict(x_test)
import pandas as pd
df_pred = pd.DataFrame(predictions, columns=class_names)
df_pred.describe()
```</pre>



<p>However, the most useful statistics will compare our model&#8217;s prediction with the ground truth &#8220;real&#8221; labels of our dataset. We can also break this down by item category:</p>



<pre class="EnlighterJSRAW" data-enlighter-language="python" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">```
y_pred = model1.predict(x_test).argmax(axis=1)
cm = confusion_matrix(y_test, y_pred)
plt.figure(figsize=(8,6))
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', xticklabels=class_names, yticklabels=class_names)
plt.xlabel('Predicted')
plt.ylabel('True')
plt.title('Confusion Matrix')
plt.show()
print('Classification report:')
print(classification_report(y_test, y_pred, target_names=class_names))
```</pre>



<p>From here, we can notice that the accuracy differs quite a bit by type of garment. A possible interpretation of this is that trousers are quite a distinct type of clothing from, say, t-shirts and shirts, which can be more commonly confused.&nbsp;</p>



<p>This is, of course, the type of nuance that, as humans, we can pick up by looking at the images, but the model only has access to a matrix of pixel values. The data does seem, however, to confirm our intuition. We can further build a more comprehensive visualization to test this hypothesis.&nbsp;</p>



<figure class="wp-block-image size-full"><img style="width:100% !important; height:auto !important; max-width:100% !important;" decoding="async" loading="lazy" src="https://blog.jetbrains.com/wp-content/uploads/2026/04/image-4.png" alt="" class="wp-image-697493"/></figure>



<pre class="EnlighterJSRAW" data-enlighter-language="python" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">```
import numpy as np
import matplotlib.pyplot as plt
# pick 8 wrong examples
y_pred = predictions.argmax(axis=1)
wrong_idx = np.where(y_pred != y_test)[0][:8]  # first 8 mistakes
n = len(wrong_idx)
fig, axes = plt.subplots(n, 2, figsize=(10, 2.2 * n), constrained_layout=True)
for row, idx in enumerate(wrong_idx):
    p = predictions[idx]
    pred = int(np.argmax(p))
    true = int(y_test[idx])
    axes[row, 0].imshow(x_test[idx], cmap="gray")
    axes[row, 0].axis("off")
    axes[row, 0].set_title(
        f"WRONG  P:{class_names[pred]} ({p[pred]:.2f})  T:{class_names[true]}",
        color="red",
        fontsize=10
    )
    bars = axes[row, 1].bar(range(len(class_names)), p, color="lightgray")
    bars[pred].set_color("red")
    axes[row, 1].set_ylim(0, 1)
    axes[row, 1].set_xticks(range(len(class_names)))
    axes[row, 1].set_xticklabels(class_names, rotation=90, fontsize=8)
    axes[row, 1].set_ylabel("conf", fontsize=9)
plt.show()
```</pre>



<p>This table generates a view where we can explore the confidence our model had in a prediction: By exploring which weight each class was given, we can see where there was doubt (i.e. multiple classes with a higher weight) versus when the model was certain (only one guess). These examples further confirm our intuition: top-types appear to be more commonly confused by the model.&nbsp;</p>



<h2 class="wp-block-heading">Conclusion</h2>



<p>And there we have it! We were able to set up and train our first model and already drive some data science insights from our data and model results. Using some of the PyCharm functionalities at this point can speed up the experimentation process by providing access to our documentation and applying code completion directly in the cells. We can even use AI Assistant to help generate some of the graphs we&#8217;ll need to further evaluate the TensorFlow model performance and investigate our results.</p>



<p>You can <a href="https://github.com/iuliaferoli/TensorFlow_with_pycharm" target="_blank" rel="noopener">try out this notebook yourself</a>, or better yet, try to generate it with these same tools for a more hands-on learning experience.</p>



<h2 class="wp-block-heading">Where to go next</h2>



<p><a href="https://github.com/iuliaferoli/TensorFlow_with_pycharm" target="_blank" rel="noopener">This notebook</a> is a minimal, teachable starting point. Here are some practical next steps to try afterwards:</p>



<ul>
<li>Replace the dense baseline with a small CNN (Conv2D → MaxPooling → Dense).</li>



<li>Add dropout or batch normalization to reduce overfitting.</li>



<li>Apply data augmentation (random shifts/rotations) to improve generalization.</li>



<li>Use callbacks like <code>EarlyStopping</code> and <code>ModelCheckpoint</code> so training is efficient, and you keep the best weights.</li>



<li>Export a <code>SavedModel</code> for server use or convert to TensorFlow Lite for edge devices (Raspberry Pi, microcontrollers).</li>
</ul>



<h2 class="wp-block-heading">Frequently asked questions</h2>



<h3 class="wp-block-heading">When should I use TensorFlow?</h3>



<p>TensorFlow is best used when building machine learning or deep learning models that need to scale, go into production, or run across different environments (cloud, mobile, edge devices).&nbsp;</p>



<p>TensorFlow is particularly well-suited for large-scale models and neural networks, including scenarios where you need strong deployment support (TensorFlow Serving, TensorFlow Lite). For research prototypes, TensorFlow is viable, but it’s more commonplace to use lightweight frameworks for easier experimentation.</p>



<h3 class="wp-block-heading">Can TensorFlow run on a GPU?</h3>



<p>Yes, TensorFlow can run GPUs and TPUs. Additionally, using a GPU can significantly speed up training, especially for deep learning models with large datasets. The best part is, TensorFlow will automatically use an available GPU if it’s properly configured.</p>



<h3 class="wp-block-heading">What is loss in TensorFlow?</h3>



<p>Loss (otherwise known as loss function) measures how far a model’s predictions are from the actual target values. Loss in TensorFlow is a numerical value representing the distance between predictions and actual target values. A few examples include:&nbsp;</p>



<ul>
<li>MSE (mean squared error), used in regression tasks.</li>



<li>Cross-entropy loss, often used in classification tasks.</li>
</ul>



<h3 class="wp-block-heading">How many epochs should I use?</h3>



<p>There’s no set number of epochs to use, as it depends on your dataset and model. Typical approaches cover:&nbsp;</p>



<ul>
<li>Starting with a conservative number (10–50 epochs).</li>



<li>Monitoring validation loss/accuracy and adjusting based on the results you see.</li>



<li>Using early stopping to halt training when improvements decrease.</li>
</ul>



<p>An epoch is one full pass through your training data. Not enough passes through leads to underfitting, and too many can cause overfitting. The sweet spot is where your model generalizes best to unseen data.&nbsp;</p>



<h2 class="wp-block-heading" id="author">About the author</h2>


    <div class="about-author ">
        <div class="about-author__box">
            <div class="row">
                                                            <div class="about-author__box-img">
                            <img style="width:100% !important; height:auto !important; max-width:100% !important;" decoding="async" src="https://blog.jetbrains.com/wp-content/uploads/2026/04/Iulia-Feroli-e1775558363746.png" alt="" loading="lazy">
                        </div>
                                        <div class="about-author__box-text">
                                                    <h4>Iulia Feroli</h4>
                                                <p><span style="font-weight: 400;">Iulia’s mission is to make tech exciting, understandable, and accessible to the new generation.</span></p>
<p><span style="font-weight: 400;">With a background spanning data science, AI, cloud architecture, and open source, she brings a unique perspective on bridging technical depth with approachability.</span></p>
<p><span style="font-weight: 400;">She’s building her own brand, Back To Engineering, through which she creates a community for tech enthusiasts, engineers, and makers. From YouTube videos on building robots from scratch, to conference talks or keynotes about real, grounded AI, and technical blogs and tutorials </span><span style="font-weight: 400;">–</span><span style="font-weight: 400;"> Iulia shares her message worldwide on how to turn complex concepts into tools developers can use every day.</span></p>
                    </div>
                            </div>
        </div>
    </div>



<p></p>
]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>What’s New in PyCharm 2026.1</title>
		<link>https://blog.jetbrains.com/pycharm/2026/03/what-s-new-in-pycharm-2026-1/</link>
		
		<dc:creator><![CDATA[Ilia Afanasiev]]></dc:creator>
		<pubDate>Mon, 30 Mar 2026 15:31:08 +0000</pubDate>
		<featuredImage>https://blog.jetbrains.com/wp-content/uploads/2026/03/PC-releases-BlogFeatured-1280x720-1.png</featuredImage>		<category><![CDATA[releases]]></category>
		<guid isPermaLink="false">https://blog.jetbrains.com/?post_type=pycharm&#038;p=687025</guid>

					<description><![CDATA[Welcome to PyCharm 2026.1. This release doesn’t just add features – it rethinks how you build, debug, and scale Python projects. From a brand-new debugging engine powered by debugpy to first-class uv support on remote targets and expanded JavaScript support in the free tier, this version is all about removing friction and letting you focus [&#8230;]]]></description>
										<content:encoded><![CDATA[
<p>Welcome to PyCharm 2026.1. This release doesn’t just add features – it rethinks how you build, debug, and scale Python projects. From a brand-new debugging engine powered by debugpy to first-class uv support on remote targets and expanded JavaScript support in the free tier, this version is all about removing friction and letting you focus on your code. Whether you’re working locally, over SSH, or inside Docker, PyCharm now adapts to your setup instead of the other way around.</p>



<p>In this post, we’ll explore the highlights of this update and show you how these improvements can streamline your daily workflow.</p>



<h2 class="wp-block-heading">Standardizing the future of debugging with debugpy</h2>



<p>PyCharm now offers the option to use debugpy as the default debugger backend, providing the industry-standard Debug Adapter Protocol (DAP) that aligns the IDE with the broader Python ecosystem. By replacing complex, legacy socket-waiting logic with a more stable connection model, race conditions and timing edge cases will no longer interfere with your debugging experience.</p>



<h3 class="wp-block-heading">A modern foundation for Python development</h3>



<p>The new engine provides full native support for <a href="https://peps.python.org/pep-0669/" target="_blank" rel="noopener">PEP 669</a>, utilizing Python 3.12’s low-impact monitoring API to significantly reduce debugger overhead compared to the legacy <code>sys.settrace()</code> approach. This ensures that your debugging sessions are faster and less intrusive. Furthermore, the migration introduces comprehensive <code>asyncio</code> support. You can now use the full suite of debugger tools, such as the debug console and expression evaluation, directly within async contexts for modern frameworks like FastAPI and aiohttp.&nbsp;</p>



<h3 class="wp-block-heading">Reliability across environments</h3>



<p>Beyond performance improvements, debugpy simplifies the <em>Attach to Process</em> experience by providing a standardized approach for Docker containers, remote servers on AWS, Azure, or GCP, and local running processes. For specialized workflows, we have introduced a new <em>Attach to DAP</em> run configuration. This allows you to connect to targets using the <code>debugpy.listen()</code> command, eliminating the friction of manual connection management and allowing you to focus on your code instead of debugging infrastructure.</p>



<figure class="wp-block-video"><video controls src="https://blog.jetbrains.com/wp-content/uploads/2026/03/debugpy.webm"></video></figure>



<h2 class="wp-block-heading">Support for uv as a remote interpreter</h2>



<p>Many developers work on projects where the code and dependencies live on a remote server – whether via SSH, in WSL, or inside Docker. By connecting PyCharm to a remote machine and using uv as the interpreter, you can keep the environment fully synchronized, ensure package management works as expected, and run projects smoothly – just as if everything were local.</p>



<figure class="wp-block-video"><video controls src="https://blog.jetbrains.com/wp-content/uploads/2026/03/uv_on_wsl.webm"></video></figure>



<h2 class="wp-block-heading">Free professional web development for everyone</h2>



<p>With PyCharm 2026.1, the core IDE experience continues to evolve as we bring a broader set of professional-grade web tools to all users for free. Everyone, from beginners to backend-first developers, now has access to a substantial set of JavaScript, TypeScript,<strong> </strong>and CSS features, as well as advanced navigation and code intelligence previously available only with a Pro subscription.</p>



<figure class="wp-block-video"><video controls src="https://blog.jetbrains.com/wp-content/uploads/2026/03/Webstorm_Free_JS.webm"></video></figure>



<p>For a complete breakdown of all new features, check out this <a href="https://blog.jetbrains.com/pycharm/2026/03/expanding-our-core-web-development-support-in-pycharm-2026-1/">blog post</a>. </p>



<h2 class="wp-block-heading">Advancements in AI integration</h2>



<p>PyCharm is evolving into an open platform that gives you the freedom to bring the AI tools of your choice directly into your professional development workflow. This release focuses on providing a flexible ecosystem where you can orchestrate the best models and agents available today.</p>



<h3 class="wp-block-heading">The ACP Registry: Your gateway to new agents</h3>



<p>Keeping up with the rapid pace of AI development can be a challenge, with new coding agents appearing almost daily. To help you navigate this dynamic landscape, we’ve launched the <a href="https://blog.jetbrains.com/ai/2026/01/acp-agent-registry/">ACP Registry</a> – a built-in directory of AI coding agents integrated directly into your IDE via the Agent Client Protocol.</p>



<p>Whether you want to experiment with open-source agents like OpenCode or specialized tools like Gemini CLI, you can now discover and install them in just a few clicks. If you have a custom setup or an agent that isn’t listed yet, you can easily add it via the <code>acp.json</code> configuration, giving you the flexibility to use your favorite tools, with no strings attached.</p>



<figure class="wp-block-video"><video controls src="https://blog.jetbrains.com/wp-content/uploads/2026/03/ACP.webm"></video></figure>



<h3 class="wp-block-heading">Native OpenAI Codex integration and BYOK</h3>



<p>OpenAI Codex is now natively integrated into the JetBrains AI chat. This means you can tackle complex development tasks without switching to a browser or copy-pasting code between windows.</p>



<p>We’ve also introduced Bring Your Own Key (BYOK) support. You can now connect your own API keys from OpenAI, Anthropic, or other compatible providers – including local models – directly in the IDE settings. This allows you to choose the setup that fits your workflow and budget best, while keeping all your AI-powered development inside PyCharm.</p>



<h3 class="wp-block-heading">Stay in the flow with next edit suggestions</h3>



<p>Small changes in your code often trigger a cascade of mechanical follow-up edits. Adding a parameter to a function or renaming a symbol can lead to errors popping up across your entire file.</p>



<p>Next edit suggestions (NES) offer a smarter, lightweight alternative to asking an AI agent for a full rewrite. As you modify your code, PyCharm proactively predicts the most likely next changes and suggests them inline.</p>



<ul>
<li><strong>Effortless consistency:</strong> Update all call sites across a file with a simple <em>Tab Tab</em> experience.</li>



<li><strong>Stay in control:</strong> Move step by step through changes rather than reviewing large, automated diffs.</li>



<li><strong>No quota required:</strong> Use NES without consuming AI credits – available without consuming the AI quota of your JetBrains AI Pro subscription.</li>
</ul>



<p>This natural evolution of code completion keeps you in the flow, making those small cascading fixes feel almost effortless.</p>



<figure class="wp-block-video"><video controls src="https://blog.jetbrains.com/wp-content/uploads/2026/03/NES.webm"></video></figure>



<p>All of the updates mentioned above are just a glimpse of what’s new in PyCharm 2026.1.</p>



<p>There is even more under the hood, including performance improvements, stability upgrades, and thoughtful refinements across the IDE that make everyday development smoother and faster.</p>



<p>To explore the full list of updates, check out our <a href="https://www.jetbrains.com/pycharm/whatsnew/" target="_blank" rel="noopener">What’s New</a> page.&nbsp;</p>



<p>As always, we would love to hear your feedback. Your insights help us shape the future of PyCharm – and we cannot wait to see what you build next.</p>
]]></content:encoded>
					
		
		
		                    <language>
                        <code><![CDATA[zh-hans]]></code>
                        <url>https://blog.jetbrains.com/zh-hans/pycharm/2026/03/what-s-new-in-pycharm-2026-1/</url>
                    </language>
                                    <language>
                        <code><![CDATA[pt-br]]></code>
                        <url>https://blog.jetbrains.com/pt-br/pycharm/2026/03/what-s-new-in-pycharm-2026-1/</url>
                    </language>
                                    <language>
                        <code><![CDATA[ko]]></code>
                        <url>https://blog.jetbrains.com/ko/pycharm/2026/03/what-s-new-in-pycharm-2026-1/</url>
                    </language>
                                    <language>
                        <code><![CDATA[ja]]></code>
                        <url>https://blog.jetbrains.com/ja/pycharm/2026/03/what-s-new-in-pycharm-2026-1/</url>
                    </language>
                                    <language>
                        <code><![CDATA[fr]]></code>
                        <url>https://blog.jetbrains.com/fr/pycharm/2026/03/what-s-new-in-pycharm-2026-1/</url>
                    </language>
                                    <language>
                        <code><![CDATA[es]]></code>
                        <url>https://blog.jetbrains.com/es/pycharm/2026/03/what-s-new-in-pycharm-2026-1/</url>
                    </language>
                                    <language>
                        <code><![CDATA[de]]></code>
                        <url>https://blog.jetbrains.com/de/pycharm/2026/03/what-s-new-in-pycharm-2026-1/</url>
                    </language>
                	</item>
		<item>
		<title>Expanding Our Core Web Development Support in PyCharm 2026.1</title>
		<link>https://blog.jetbrains.com/pycharm/2026/03/expanding-our-core-web-development-support-in-pycharm-2026-1/</link>
		
		<dc:creator><![CDATA[Ilia Afanasiev]]></dc:creator>
		<pubDate>Wed, 25 Mar 2026 15:01:54 +0000</pubDate>
		<featuredImage>https://blog.jetbrains.com/wp-content/uploads/2026/03/PC-social-BlogFeatured-1280x720-1.png</featuredImage>		<category><![CDATA[web-development]]></category>
		<guid isPermaLink="false">https://blog.jetbrains.com/?post_type=pycharm&#038;p=687028</guid>

					<description><![CDATA[With PyCharm 2026.1, our core IDE experience continues to evolve as we’re bringing a broader set of professional-grade web tools to all users for free. Everyone, from beginners to backend-first developers, is getting access to a substantial set of JavaScript, TypeScript, and CSS features that were previously only available with a Pro subscription. React, JavaScript, [&#8230;]]]></description>
										<content:encoded><![CDATA[
<p>With PyCharm 2026.1, our core IDE experience continues to evolve as we’re bringing a broader set of professional-grade web tools to all users for free. Everyone, from beginners to backend-first developers, is getting access to a substantial set of JavaScript, TypeScript,<strong> </strong>and CSS features that were previously only available with a Pro subscription.</p>



<h3 class="wp-block-heading">React, JavaScript, TypeScript, and CSS support</h3>



<p>Leverage a comprehensive set of editing and formatting tools for modern web languages within PyCharm, including:</p>



<ul>
<li><strong>Basic React support </strong>with code completion, component and attribute navigation, and React component and prop rename refactorings.</li>



<li><strong>Advanced import management</strong>:
<ul>
<li>Enjoy automatic JavaScript and TypeScript imports as you work.</li>



<li>Merge or remove unnecessary references via the <em>Optimize imports</em> feature.</li>



<li>Get required imports automatically when you paste code into the editor.</li>
</ul>
</li>
</ul>



<ul>
<li><strong>Enhanced styling</strong>: Access CSS-tailored code completion, inspections, and quick-fixes, and view any changes in real time via the built-in web preview.</li>



<li><strong>Smart editor behavior:</strong> Utilize smart keys, code vision inlay hints, and postfix code completions designed for web development.</li>
</ul>



<h3 class="wp-block-heading">Navigation and code intelligence</h3>



<p>Finding your way around web projects is now even more efficient with tools that allow for:</p>



<ul>
<li><strong>Pro-grade navigation:</strong> Use dedicated gutter icons for <em>Jump to&#8230;</em> actions, recursive calls, and TypeScript source mapping.</li>



<li><strong>Core web refactorings:</strong> Perform essential code changes with reliable <em>Rename</em> refactorings and actions (<em>Introduce variable</em>, <em>Change signature</em>, <em>Move members</em>, and more<em>)</em>.</li>



<li><strong>Quality control</strong>: Maintain high code standards with professional-grade inspections, intentions, and quick-fixes.</li>



<li><strong>Code cleanup</strong>: Identify redundant code blocks through JavaScript and TypeScript duplicate detection.</li>
</ul>



<h3 class="wp-block-heading">Frameworks and integrated tools</h3>



<p>With the added essential support for some of the most popular frontend frameworks and tools, you will have access to:</p>



<ul>
<li><strong>Project initialization</strong>: Create new web projects quickly using the built-in Vite generator.</li>



<li><strong>Standard tooling</strong>: Standardize code quality with integrated support for Prettier, ESLint, TSLint, and StyleLint.</li>



<li><strong>Script management</strong>: Discover and execute NPM scripts directly from your <code>package.json</code>.</li>



<li><strong>Security</strong>: Check project dependencies for security vulnerabilities.</li>
</ul>



<p>We’re excited to bring these tried and true features to the core PyCharm experience for free! We’re certain these tools will help beginners, students, and hobbyists tackle real-world tasks within a single, powerful IDE. Best of all, core PyCharm can be used for both commercial and non-commercial projects, so it will grow with you as you move from learning to professional development.</p>
]]></content:encoded>
					
		
		
		                    <language>
                        <code><![CDATA[zh-hans]]></code>
                        <url>https://blog.jetbrains.com/zh-hans/pycharm/2026/03/expanding-our-core-web-development-support-in-pycharm-2026-1/</url>
                    </language>
                                    <language>
                        <code><![CDATA[pt-br]]></code>
                        <url>https://blog.jetbrains.com/pt-br/pycharm/2026/03/expanding-our-core-web-development-support-in-pycharm-2026-1/</url>
                    </language>
                                    <language>
                        <code><![CDATA[ko]]></code>
                        <url>https://blog.jetbrains.com/ko/pycharm/2026/03/expanding-our-core-web-development-support-in-pycharm-2026-1/</url>
                    </language>
                                    <language>
                        <code><![CDATA[ja]]></code>
                        <url>https://blog.jetbrains.com/ja/pycharm/2026/03/expanding-our-core-web-development-support-in-pycharm-2026-1/</url>
                    </language>
                                    <language>
                        <code><![CDATA[fr]]></code>
                        <url>https://blog.jetbrains.com/fr/pycharm/2026/03/expanding-our-core-web-development-support-in-pycharm-2026-1/</url>
                    </language>
                                    <language>
                        <code><![CDATA[es]]></code>
                        <url>https://blog.jetbrains.com/es/pycharm/2026/03/expanding-our-core-web-development-support-in-pycharm-2026-1/</url>
                    </language>
                	</item>
		<item>
		<title>OpenAI Acquires Astral: What It Means for PyCharm Users</title>
		<link>https://blog.jetbrains.com/pycharm/2026/03/openai-acquires-astral-what-it-means-for-pycharm-users/</link>
		
		<dc:creator><![CDATA[Ilia Afanasiev]]></dc:creator>
		<pubDate>Mon, 23 Mar 2026 16:04:34 +0000</pubDate>
		<featuredImage>https://blog.jetbrains.com/wp-content/uploads/2026/03/PC-social-BlogFeatured-1280x720-1-2.png</featuredImage>		<guid isPermaLink="false">https://blog.jetbrains.com/?post_type=pycharm&#038;p=691821</guid>

					<description><![CDATA[On March 19, OpenAI announced that it would acquire Astral, the company behind uv, Ruff, and ty. The Astral team, led by founder Charlie Marsh, will join OpenAI&#8217;s Codex team. The deal is subject to regulatory approval. First and foremost: congratulations to Charlie Marsh and the entire Astral team. They shipped some of the most [&#8230;]]]></description>
										<content:encoded><![CDATA[
<p>On March 19, OpenAI announced that it would <a href="https://openai.com/index/openai-to-acquire-astral/" target="_blank" rel="noopener">acquire Astral</a>, the company behind uv, Ruff, and ty. The Astral team, led by founder Charlie Marsh, will <a href="https://astral.sh/blog/openai" target="_blank" rel="noopener">join OpenAI&#8217;s Codex team</a>. The deal is subject to regulatory approval.</p>



<p>First and foremost: congratulations to Charlie Marsh and the entire Astral team. They shipped some of the most beloved tools in the Python ecosystem and raised the bar for what developer tooling can be. This acquisition is a reflection of the impact they&#8217;ve had.</p>



<p>This is big news for the Python ecosystem, and it matters to us at JetBrains. Here&#8217;s our perspective.</p>



<h2 class="wp-block-heading">What Astral built</h2>



<p>In just two years, Astral transformed Python tooling. Their tools now see hundreds of millions of downloads every month, and for good reason:</p>



<ul>
<li><strong>uv</strong> is a blazing-fast package and environment manager that unifies functionality from pip, venv, pyenv, pipx, and more into a single tool. With around 124 million monthly downloads, it has quickly become the default choice for many Python developers.</li>



<li><strong>Ruff</strong> is an extremely fast linter and formatter, written in Rust. For many teams it has replaced flake8, isort, and black entirely.</li>



<li><strong>ty</strong> is a new type checker for Python. It&#8217;s still early, and we’re already working on it with PyCharm. It&#8217;s showing promise.</li>
</ul>



<p>This is foundational infrastructure that millions of developers rely on every day. We&#8217;ve integrated both Ruff and uv into PyCharm because they substantially make Python development better.</p>



<h2 class="wp-block-heading">The risks are real, but manageable</h2>



<p>Change always carries risk, and acquisitions are no exception. The main concern here is straightforward: if Astral&#8217;s engineers get reassigned to OpenAI&#8217;s more commercial priorities, these tools could stagnate over time.</p>



<p>The good news is that Astral&#8217;s tools are open-source under permissive licenses. The community can fork them if it ever comes to that. As Armin Ronacher <a href="https://lucumr.pocoo.org/2024/8/21/harvest-season/" target="_blank" rel="noopener">has noted</a>, uv is &#8220;very forkable and maintainable.&#8221; There’s no possible future where these tools go <em>backwards.</em></p>



<p>Both OpenAI and Astral have committed to continued open-source development. We take them at their word, and we hope for the best.</p>



<h2 class="wp-block-heading">Our commitment hasn&#8217;t changed</h2>



<p>JetBrains already has great working relationships with both the Astral and the Codex teams. We&#8217;ve been integrating Ruff and uv into PyCharm, and we will continue to do so. We’ve submitted some upstream improvements to ty. Regardless of who owns these tools, our commitment to supporting the best Python tooling for our users stays the same. We&#8217;ll keep working with whoever maintains them.</p>



<p>The Python ecosystem is stronger because of the work Astral has done. We hope this acquisition amplifies that work, not diminishes it. We&#8217;ll be watching closely, and we&#8217;ll keep building the best possible experience for Python developers in PyCharm.</p>
]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>Python Unplugged on PyTV Recap</title>
		<link>https://blog.jetbrains.com/pycharm/2026/03/python-unplugged-on-pytv-recap/</link>
		
		<dc:creator><![CDATA[Will Vincent]]></dc:creator>
		<pubDate>Fri, 13 Mar 2026 13:05:10 +0000</pubDate>
		<featuredImage>https://blog.jetbrains.com/wp-content/uploads/2026/03/TechRadar_Email_Half-page_600-300-1.png</featuredImage>		<product ><![CDATA[pycharm]]></product>
		<category><![CDATA[livestreams]]></category>
		<category><![CDATA[news]]></category>
		<category><![CDATA[amsterdam]]></category>
		<category><![CDATA[conferences]]></category>
		<category><![CDATA[python]]></category>
		<guid isPermaLink="false">https://blog.jetbrains.com/?post_type=pycharm&#038;p=687433</guid>

					<description><![CDATA[Last week marked the fruition of almost a year of hard work by the entire PyCharm team. On March 4th, 2026, we hosted Python Unplugged on PyTV, our first-ever community conference featuring a 90s music-inspired online conference for the Python community. The PyCharm team is a fixture at Python conferences globally, such as PyCon US [&#8230;]]]></description>
										<content:encoded><![CDATA[
<p>Last week marked the fruition of almost a year of hard work by the entire <a href="https://www.jetbrains.com/pycharm/" target="_blank" rel="noopener">PyCharm</a> team. On March 4th, 2026, we hosted <a href="https://www.youtube.com/live/qKkyBhXIJJU?si=ilEeq1iRXquQhssj" target="_blank" rel="noopener">Python Unplugged on PyTV,</a> our first-ever community conference featuring a 90s music-inspired online conference for the Python community.</p>



<figure class="wp-block-embed is-type-video is-provider-youtube wp-block-embed-youtube wp-embed-aspect-16-9 wp-has-aspect-ratio"><div class="wp-block-embed__wrapper">
<iframe loading="lazy" title="Python Unplugged on PyTV – Free Online Python Conference" src="https://www.youtube.com/embed/qKkyBhXIJJU?feature=oembed" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen></iframe>
</div><figcaption class="wp-element-caption"><span style="font-family: Roboto, Arial, sans-serif;font-weight: 700">Python Unplugged on PyTV – Free Online Python Conference</span></figcaption></figure>



<p>The PyCharm team is a fixture at Python conferences globally, such as <a href="https://us.pycon.org/2026/" target="_blank" rel="noopener">PyCon US</a> and <a href="https://ep2026.europython.eu/" target="_blank" rel="noopener">EuroPython</a>, but we recognize that while attending a conference can be life-changing, the costs involved put it out of reach for many Pythonistas.</p>



<p>We wanted to recreate the entire Python conference experience in a digital format, complete with live talks, hallway tracks, and Q&amp;A sessions, so anyone, anywhere in the world, could join in and participate.</p>



<p>And we did it! Superstar speakers from across the Python community joined us in our studio in Amsterdam, Netherlands &#8211; the country where Python was born. Some of them traveled for over 10 hours, and one even joined with their newborn baby! Travis Oliphant, of Numpy and Scipy fame, was ultimately unable to join us in person, but he kindly pre-recorded a wonderful talk and participated in a live Q&amp;A after it, despite it being very early morning in his time zone.&nbsp;</p>



<figure class="wp-block-image size-full"><img style="width:100% !important; height:auto !important; max-width:100% !important;" decoding="async" loading="lazy" src="https://blog.jetbrains.com/wp-content/uploads/2026/03/IMG_7830-1.jpg" alt="" class="wp-image-687545"/><figcaption class="wp-element-caption"><em>Cheuk Ting Ho, Jodie Burchell,&nbsp; Valerie Andrianova</em></figcaption></figure>



<p>The PyCharm team is extremely grateful for the community&#8217;s support in making this happen.</p>



<h2 class="wp-block-heading">The event</h2>



<p>We <a href="https://www.youtube.com/live/qKkyBhXIJJU?si=fd5uQvLnpEL2P9lU" target="_blank" rel="noopener">livestreamed the entire event</a> from 11am to 6:30pm CET/CEST, almost seven and a half hours of content, featuring 15 speakers, a PyLadies panel, and an ongoing quiz with prizes. Topics covered the future of Python, AI, data science, web development, and more.</p>



<p>Here is the complete list of speakers and timestamped links to their talks:</p>



<ul>
<li><a href="https://www.youtube.com/watch?v=qKkyBhXIJJU&amp;t=24031s" target="_blank" rel="noopener">Carol Willing </a>&#8211; JupyterLab Core Developer</li>



<li><a href="https://www.youtube.com/watch?v=qKkyBhXIJJU&amp;t=673s" target="_blank" rel="noopener">Deb Nicholson</a> &#8211; Executive Director, Python Software Foundation</li>



<li><a href="https://www.youtube.com/watch?v=qKkyBhXIJJU&amp;t=6604s" target="_blank" rel="noopener">Ritchie Vink</a> &#8211; Creator of Polars</li>



<li><a href="https://www.youtube.com/watch?v=qKkyBhXIJJU&amp;t=16430s" target="_blank" rel="noopener">Travis Oliphant</a> &#8211; Creator of NumPy</li>



<li><a href="https://www.youtube.com/watch?v=qKkyBhXIJJU&amp;t=12480s" target="_blank" rel="noopener">Sarah Boyce</a> &#8211; Django Fellow&nbsp;</li>



<li><a href="https://www.youtube.com/watch?v=qKkyBhXIJJU&amp;t=14389s" target="_blank" rel="noopener">Sheena O’Connell</a> &#8211; Python Software Foundation Board Member&nbsp;</li>



<li><a href="https://www.youtube.com/watch?v=qKkyBhXIJJU&amp;t=18360s" target="_blank" rel="noopener">Marlene Mhangami</a> &#8211; Senior Developer Advocate at Microsoft&nbsp;</li>



<li><a href="https://www.youtube.com/watch?v=qKkyBhXIJJU&amp;t=10591s" target="_blank" rel="noopener">Carlton Gibson</a> &#8211; Creator of multiple open-source projects in the Django ecosystem</li>



<li><a href="https://www.youtube.com/watch?v=qKkyBhXIJJU&amp;t=22179s" target="_blank" rel="noopener">Tuana Çelik </a>&#8211; Developer Relations Engineer at LlamaIndex</li>



<li><a href="https://www.youtube.com/watch?v=qKkyBhXIJJU&amp;t=20289s" target="_blank" rel="noopener">Merve Noyan</a> &#8211; Machine Learning Engineer at Hugging Face&nbsp;</li>



<li><a href="https://www.youtube.com/watch?v=qKkyBhXIJJU&amp;t=4750s" target="_blank" rel="noopener">Paul Everitt</a> &#8211; Developer Advocate at JetBrains</li>



<li><a href="https://www.youtube.com/watch?v=qKkyBhXIJJU&amp;t=2568s" target="_blank" rel="noopener">Mark Smith</a> &#8211; Head of Python Ecosystem at JetBrains</li>



<li><a href="https://www.youtube.com/watch?v=qKkyBhXIJJU&amp;t=8630s" target="_blank" rel="noopener">Georgi Ker</a> &#8211; Director and Fellow of the Python Software Foundation</li>



<li><a href="https://www.youtube.com/watch?v=qKkyBhXIJJU&amp;t=8630s" target="_blank" rel="noopener">Una Galyeva</a> &#8211; Head of AI at Geobear Global and PyLadies Amsterdam organizer&nbsp;</li>
</ul>



<ul>
<li><a href="https://www.youtube.com/watch?v=qKkyBhXIJJU&amp;t=8630s" target="_blank" rel="noopener">Jessica Greene</a> &#8211; Senior Machine Learning Engineer at Ecosia</li>
</ul>



<p></p>



<figure class="wp-block-image size-full"><img style="width:100% !important; height:auto !important; max-width:100% !important;" decoding="async" loading="lazy" src="https://blog.jetbrains.com/wp-content/uploads/2026/03/IMG_7801.jpg" alt="" class="wp-image-687457"/><figcaption class="wp-element-caption">The studio room with presenter&#8217;s desk and Q&amp;A table.</figcaption></figure>



<figure class="wp-block-image size-full"><img style="width:100% !important; height:auto !important; max-width:100% !important;" decoding="async" loading="lazy" src="https://blog.jetbrains.com/wp-content/uploads/2026/03/IMG_7779.jpg" alt="" class="wp-image-687468"/><figcaption class="wp-element-caption"><em>Production meeting the day before the event</em></figcaption></figure>



<p>We spent the afternoon doing final checks and a run-through with the studio team at <a href="https://vixylive.com/" target="_blank" rel="noopener">Vixy Live</a>. They were very professional and patient with us as we were working in a studio for the first time. With their help, we were confident that the event the next day would go smoothly.</p>



<p></p>



<h2 class="wp-block-heading alignwide" id="we-re-a-studio-in-berlin-with-an-international-practice-in-architecture-urban-planning-and-interior-design-we-believe-in-sharing-knowledge-and-promoting-dialogue-to-increase-the-creative-potential-of-collaboration" style="font-size:48px;line-height:1.1">Livestream day</h2>



<p>On the day of the livestream, we arrived early to get our makeup done. The makeup artists were absolute pros, and we all looked great on camera. One of our speakers, Carol, jokingly said that she is now 20 years younger! The hosts, Jodie, Will, and Cheuk, were totally covered in ‘90s fashion and vibes.</p>



<figure class="wp-block-image size-full"><img style="width:100% !important; height:auto !important; max-width:100% !important;" decoding="async" loading="lazy" src="https://blog.jetbrains.com/wp-content/uploads/2026/03/image5.jpg" alt="" class="wp-image-687479"/><figcaption class="wp-element-caption"><em>Python Team Lead Jodie Burchell bringing the 90s back</em></figcaption></figure>



<p>We also had swag designed by our incredible marketing team, including t-shirts, stickers, posters, and tote bags.</p>



<figure class="wp-block-image size-full"><img style="width:100% !important; height:auto !important; max-width:100% !important;" decoding="async" loading="lazy" src="https://blog.jetbrains.com/wp-content/uploads/2026/03/IMG_7814.jpg" alt="" class="wp-image-687490"/><figcaption class="wp-element-caption"><em>PyTV Stickers for all participants</em></figcaption></figure>



<figure class="wp-block-image size-full"><img style="width:100% !important; height:auto !important; max-width:100% !important;" decoding="async" loading="lazy" src="https://blog.jetbrains.com/wp-content/uploads/2026/03/IMG_7815.jpg" alt="" class="wp-image-687501"/><figcaption class="wp-element-caption"><em>PyTV Totebags</em></figcaption></figure>



<p></p>



<figure class="wp-block-image size-full"><img style="width:100% !important; height:auto !important; max-width:100% !important;" decoding="async" loading="lazy" src="https://blog.jetbrains.com/wp-content/uploads/2026/03/IMG_7820.jpg" alt="" class="wp-image-687512"/><figcaption class="wp-element-caption"><em>PyTV posters</em></figcaption></figure>



<p></p>



<h2 class="wp-block-heading">Python content for everyone</h2>



<p>After a brief opening introducing the conference and the event Discord, we began with a series of talks focused on the community, learning Python, and other hot Python topics. We also had two panels, both absolutely inspiring: one on the role of AI in open source and another featuring prominent members of PyLadies.</p>



<p>Following our first block of speakers, we moved on to web development-focused talks from key people involved with the Django framework. We then had a series of talks from experts across the data science and AI world, including speakers from Microsoft, Hugging Face, and LlamaIndex, who gave us up-to-date insights into open-source AI and agent-based approaches. We ended with a talk by Carol Willing, one of the most respected figures in the Python community.</p>



<p>Throughout the day, we ran a quiz for the audience to test their knowledge about Python and the community. Since we had many audience members learning Python, we hope they learned some fun facts about Python through the quiz.</p>



<figure class="wp-block-image size-full"><img style="width:100% !important; height:auto !important; max-width:100% !important;" decoding="async" loading="lazy" src="https://blog.jetbrains.com/wp-content/uploads/2026/03/image7-2.png" alt="" class="wp-image-687523"/><figcaption class="wp-element-caption"><em>First of 8 questions on the Python ecosystem</em></figcaption></figure>



<p></p>



<figure class="wp-block-image size-full"><img style="width:100% !important; height:auto !important; max-width:100% !important;" decoding="async" loading="lazy" src="https://blog.jetbrains.com/wp-content/uploads/2026/03/image10.jpg" alt="" class="wp-image-687534"/><figcaption class="wp-element-caption"><em>Sarah Boyce, Will Vincent, Sheena O’Connell, Carlton Gibson, Marlene Mhangami</em></figcaption></figure>



<h2 class="wp-block-heading">Next year?</h2>



<p>Looking at the numbers, we had more than 5,500 people join us during the live stream, with most of them watching at least one talk. We’ve since had another 8,000 people as of this writing watch the event recording. </p>



<p>We&#8217;d love to do this event again next year. If you have suggestions for speakers, topics, swag, or anything else please leave it in the comments!</p>


    <div class="buttons">
        <div class="buttons__row">
                                                <a href="https://www.youtube.com/live/qKkyBhXIJJU" class="btn" target="" rel="noopener">Watch now</a>
                                                    </div>
    </div>







<p></p>
]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>Cursor Joined the ACP Registry and Is Now Live in Your JetBrains IDE</title>
		<link>https://blog.jetbrains.com/ai/2026/03/cursor-joined-the-acp-registry-and-is-now-live-in-your-jetbrains-ide/</link>
		
		<dc:creator><![CDATA[Jan-Niklas Wortmann]]></dc:creator>
		<pubDate>Wed, 04 Mar 2026 15:28:01 +0000</pubDate>
		<featuredImage>https://blog.jetbrains.com/wp-content/uploads/2026/03/IDEs-social-BlogFeatured-1280x720-1-2.png</featuredImage>		<product ><![CDATA[idea]]></product>
		<product ><![CDATA[pycharm]]></product>
		<product ><![CDATA[rust]]></product>
		<category><![CDATA[ai-assistant]]></category>
		<category><![CDATA[news]]></category>
		<category><![CDATA[acp]]></category>
		<guid isPermaLink="false">https://blog.jetbrains.com/?post_type=ai&#038;p=685326</guid>

					<description><![CDATA[Cursor is now available as an AI agent inside JetBrains IDEs through the Agent Client Protocol. Select it from the agent picker, and it has full access to your project. If you&#8217;ve spent any time in the AI coding space, you already know Cursor. It has been one of the most requested additions to the [&#8230;]]]></description>
										<content:encoded><![CDATA[
<p>Cursor is now available as an AI agent inside JetBrains IDEs through the <a href="https://www.jetbrains.com/acp/" target="_blank" rel="noopener">Agent Client Protocol</a>. Select it from the agent picker, and it has full access to your project.</p>



<p>If you&#8217;ve spent any time in the AI coding space, you already know <a href="https://cursor.com/" target="_blank" rel="noopener">Cursor</a>. It has been one of the most requested additions to the ACP Registry.</p>



<h2 class="wp-block-heading">What you get</h2>



<p>Cursor is known for its AI-native, agentic workflows. JetBrains IDEs are valued for deep code intelligence – refactoring, debugging, code quality checks, and the tooling professionals rely on at scale. ACP brings the two together.</p>



<p>You can now use Cursor&#8217;s agentic capabilities directly inside your JetBrains IDE – within the workflows and features you already use.&nbsp;</p>



<h2 class="wp-block-heading">A growing open ecosystem</h2>



<p>Cursor joins a growing list of agents available through ACP in JetBrains IDEs. Every new addition to the ACP Registry means you have more choice – while still working inside the IDE you already rely on. You get access to frontier models from major providers, including OpenAI, Anthropic, Google, and now also Cursor.</p>



<p>This is part of our open ecosystem strategy. Plug in the agents you want and work in the IDE you love – without getting locked into a single solution.</p>



<blockquote class="wp-block-quote">
<p></p>
<cite>Cursor is focused on building the best way to build software with AI. By integrating Cursor with JetBrains IDEs, we&#8217;re excited to provide teams with powerful agentic capabilities in the environments where they&#8217;re already working.<br><br>– Jordan Topoleski, COO at Cursor</cite></blockquote>



<h2 class="wp-block-heading">Get started</h2>



<p>You need version 2025.3.2 or later of your JetBrains IDE with the <em>AI Assistant</em> plugin enabled. From there, open the agent selector, select <em>Install from</em> <em>ACP Registry…</em>, install Cursor, and start working. You <strong>don’t need a JetBrains AI subscription</strong> to use Cursor as an AI agent.</p>



<figure class="wp-block-embed is-type-video is-provider-youtube wp-block-embed-youtube wp-embed-aspect-16-9 wp-has-aspect-ratio"><div class="wp-block-embed__wrapper">
<iframe loading="lazy" title="Cursor is now live in your JetBrains IDE through ACP" src="https://www.youtube.com/embed/-AFODqVoe8s?feature=oembed" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen></iframe>
</div></figure>



<p>The ACP Registry keeps growing, and many agents have already joined it – with more on the way. Try it today with Cursor and experience agent-driven development inside your JetBrains IDE. For more information about the Agent Client Protocol, see <a href="https://blog.jetbrains.com/ai/2025/12/bring-your-own-ai-agent-to-jetbrains-ides/">our original announcement</a> and the <a href="https://blog.jetbrains.com/ai/2026/01/acp-agent-registry/">blog post on the ACP Agent Registry support</a>.</p>
]]></content:encoded>
					
		
		
		                    <language>
                        <code><![CDATA[ko]]></code>
                        <url>https://blog.jetbrains.com/ko/ai/2026/03/cursor-joined-the-acp-registry-and-is-now-live-in-your-jetbrains-ide/</url>
                    </language>
                                    <language>
                        <code><![CDATA[fr]]></code>
                        <url>https://blog.jetbrains.com/fr/ai/2026/03/cursor-joined-the-acp-registry-and-is-now-live-in-your-jetbrains-ide/</url>
                    </language>
                	</item>
		<item>
		<title>LangChain Python Tutorial: A Complete Guide for 2026</title>
		<link>https://blog.jetbrains.com/pycharm/2026/02/langchain-tutorial-2026/</link>
		
		<dc:creator><![CDATA[Cheuk Ting Ho]]></dc:creator>
		<pubDate>Thu, 19 Feb 2026 10:40:15 +0000</pubDate>
		<featuredImage>https://blog.jetbrains.com/wp-content/uploads/2026/02/PC-social-BlogFeatured-1280x720-1.png</featuredImage>		<category><![CDATA[data-science]]></category>
		<category><![CDATA[tutorials]]></category>
		<category><![CDATA[ai]]></category>
		<category><![CDATA[ai-agents]]></category>
		<category><![CDATA[chatbots]]></category>
		<category><![CDATA[langchain]]></category>
		<guid isPermaLink="false">https://blog.jetbrains.com/?post_type=pycharm&#038;p=681664</guid>

					<description><![CDATA[If you’ve read the blog post How to Build Chatbots With LangChain, you may want to know more about LangChain. This blog post will dive deeper into what LangChain offers and guide you through a few more real-world use cases. And even if you haven’t read the first post, you might still find the info [&#8230;]]]></description>
										<content:encoded><![CDATA[
<figure class="wp-block-image size-full"><img style="width:100% !important; height:auto !important; max-width:100% !important;" decoding="async" loading="lazy" src="https://blog.jetbrains.com/wp-content/uploads/2026/02/PC-social-BlogFeatured-1280x720-1.png" alt="LangChain Python Tutorial" class="wp-image-682317"/></figure>



<p>If you’ve read the blog post <a href="https://blog.jetbrains.com/pycharm/2024/08/how-to-build-chatbots-with-langchain/"><em>How to Build Chatbots With LangChain</em></a>, you may want to know more about LangChain. This blog post will dive deeper into what LangChain offers and guide you through a few more real-world use cases. And even if you haven’t read the first post, you might still find the info in this one helpful for building your next AI agent.</p>



<h2 class="wp-block-heading">LangChain fundamentals</h2>



<p>Let’s have a look at what LangChain is. LangChain provides a standard framework for building AI agents powered by LLMs, like the ones offered by OpenAI, Anthropic, Google, etc., and is therefore the easiest way to get started. LangChain supports most of the commonly used LLMs on the market today.</p>



<p>LangChain is a high-level tool built on LangGraph, which provides a low-level framework for orchestrating the agent and runtime and is suitable for more advanced users. Beginners and those who only need a simple agent build are definitely better off with LangChain.</p>



<p>We’ll start by taking a look at several important components in a LangChain agent build.</p>



<h3 class="wp-block-heading">Agents</h3>



<p>Agents are what we are building. They combine LLMs with tools to create systems that can reason about tasks, decide which tools to use for which steps, analyze intermittent results, and work towards solutions iteratively.</p>



<figure class="wp-block-image size-full"><img style="width:100% !important; height:auto !important; max-width:100% !important;" decoding="async" loading="lazy" src="https://blog.jetbrains.com/wp-content/uploads/2026/02/image-27.png" alt="" class="wp-image-681665"/></figure>



<p>Creating an agent is as simple as using the `create_agent` function with a few parameters:</p>



<pre class="EnlighterJSRAW" data-enlighter-language="python" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">from langchain.agents import create_agent

agent = create_agent(

   "gpt-5",

   tools=tools

)</pre>



<p>In this example, the LLM used is GPT-5 by OpenAI. In most cases, the provider of the LLM can be inferred. To see a list of all supported providers, head over <a href="https://reference.langchain.com/python/langchain/models/#langchain.chat_models.init_chat_model(model)" target="_blank" rel="noopener">here</a>.</p>



<h3 class="wp-block-heading">LangChain Models: Static and Dynamic</h3>



<p>There are two types of agent models that you can build: static and dynamic. Static models, as the name suggests, are straightforward and more common. The agent is configured in advance during creation and remains unchanged during execution.</p>



<pre class="EnlighterJSRAW" data-enlighter-language="python" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">import os

from langchain.chat_models import init_chat_model

os.environ["OPENAI_API_KEY"] = "sk-..."

model = init_chat_model("gpt-5")

print(model.invoke("What is PyCharm?"))</pre>



<p><br><br>Dynamic models allow you to build an agent that can switch models during runtime based on customized logic. Different models can then be picked based on the current state and context. For example, we can use ModelFallbackMiddleware (described in the <em>Middleware</em> section below) to have a backup model in case the default one fails.</p>



<pre class="EnlighterJSRAW" data-enlighter-language="python" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">from langchain.agents import create_agent

from langchain.agents.middleware import ModelFallbackMiddleware

agent = create_agent(

   model="gpt-4o",

   tools=[],

   middleware=[

       ModelFallbackMiddleware(

           "gpt-4o-mini",

           "claude-3-5-sonnet-20241022",

       ),

   ],

)</pre>



<h3 class="wp-block-heading">Tools</h3>



<p>Tools are important parts of AI agents. They make AI agents effective at carrying out tasks that involve more than just text as output, which is a fundamental difference between an agent and an LLM. Tools allow agents to interact with external systems – such as APIs, databases, or file systems. Without tools, agents would only be able to provide text output, with no way of performing actions or iteratively working their way toward a result.</p>



<p>LangChain provides decorators for systematically creating tools for your agent, making the whole process more organized and easier to maintain. Here are a couple of examples:</p>



<p>Basic tool</p>



<pre class="EnlighterJSRAW" data-enlighter-language="python" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">@tool

def search_db(query: str, limit: int = 10) -> str:

   """Search the customer database for records matching the query.

   """

...

   return f"Found {limit} results for '{query}'"</pre>



<p>Tool with a custom name</p>



<pre class="EnlighterJSRAW" data-enlighter-language="python" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">@tool("pycharm_docs_search", return_direct=False)

def pycharm_docs_search(q: str) -> str:

   """Search the local FAISS index of JetBrains PyCharm documentation and return relevant passages."""

...

   docs = retriever.get_relevant_documents(q)

   return format_docs(docs)</pre>



<h3 class="wp-block-heading">Middleware</h3>



<p>Middleware provides ways to define the logic of your agent and customize its behavior. For example, there is middleware that can monitor the agent during runtime, assist with prompting and selecting tools, or even help with advanced use cases like guardrails, etc.</p>



<p>Here are a few examples of built-in middleware. For the full list, please refer to the <a href="https://docs.langchain.com/oss/python/langchain/middleware/built-in#provider-agnostic-middleware" target="_blank" rel="noopener">LangChain middleware documentation</a>.</p>



<figure class="wp-block-table"><table><tbody><tr><td><strong>Middleware</strong></td><td><strong>Description</strong></td></tr><tr><td>Summarization</td><td>Automatically summarize the conversation history when approaching token limits.</td></tr><tr><td>Human-in-the-loop</td><td>Pause execution for human approval of tool calls.</td></tr><tr><td>Context editing</td><td>Manage conversation context by trimming or clearing tool uses.</td></tr><tr><td>PII detection</td><td>Detect and handle personally identifiable information (PII).</td></tr></tbody></table></figure>



<h2 class="wp-block-heading">Real-world LangChain use cases</h2>



<p>LangChain use cases cover a varied range of fields, with common instances including:&nbsp;</p>



<ol>
<li><a href="https://blog.jetbrains.com/pycharm/2026/02/langchain-tutorial-2026/#ai-powered-chatbots" data-type="link" data-id="https://blog.jetbrains.com/pycharm/2026/02/langchain-tutorial-2026/#ai-powered-chatbots">AI-powered chatbots</a></li>



<li><a href="https://blog.jetbrains.com/pycharm/2026/02/langchain-tutorial-2026/#document-question-answering-systems" data-type="link" data-id="https://blog.jetbrains.com/pycharm/2026/02/langchain-tutorial-2026/#document-question-answering-systems">Document question answering systems</a></li>



<li><a href="https://blog.jetbrains.com/pycharm/2026/02/langchain-tutorial-2026/#content-generation-tools" data-type="link" data-id="https://blog.jetbrains.com/pycharm/2026/02/langchain-tutorial-2026/#content-generation-tools">Content generation tools</a></li>
</ol>



<h3 class="wp-block-heading" id="ai-powered-chatbots">AI-powered chatbots</h3>



<p>When we think of AI agents, we often think of chatbots first. If you’ve read the <a href="https://blog.jetbrains.com/pycharm/2024/08/how-to-build-chatbots-with-langchain/"><em>How to Build Chatbots With LangChain</em></a> blog post, then you’re already up to speed about this use case. If not, I highly recommend checking it out.</p>



<h3 class="wp-block-heading" id="document-question-answering-systems">Document question answering systems</h3>



<p>Another real-world use case for LangChain is a document question answering system. For example, companies often have internal documents and manuals that are rather long and unwieldy. A document question answering system provides a quick way for employees to find the info they need within the documents, without having to manually read through each one.</p>



<p>To demonstrate, we’ll create a <a href="https://github.com/Cheukting/langchain-example1/blob/main/src/langchainexample/ingest_pycharm_docs.py" data-type="link" data-id="https://github.com/Cheukting/langchain-example1/blob/main/src/langchainexample/ingest_pycharm_docs.py" target="_blank" rel="noopener">script</a> to index the <a href="https://www.jetbrains.com/help/pycharm/" target="_blank" rel="noopener">PyCharm documentation</a>. Then we’ll create an AI agent that can answer questions based on the documents we indexed. First let’s take a look at our tool:</p>



<pre class="EnlighterJSRAW" data-enlighter-language="python" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">@tool("pycharm_docs_search")

def pycharm_docs_search(q: str) -> str:

   """Search the local FAISS index of JetBrains PyCharm documentation and return relevant passages."""

   # Load vector store and create retriever

   embeddings = OpenAIEmbeddings(

       model=settings.openai_embedding_model, api_key=settings.openai_api_key

   )

   vector_store = FAISS.load_local(

       settings.index_dir, embeddings, allow_dangerous_deserialization=True

   )

   k = 4

   retriever = vector_store.as_retriever(

       search_type="mmr", search_kwargs={"k": k, "fetch_k": max(k * 3, 12)}

   )

   docs = retriever.invoke(q)</pre>



<p>We are using a <a href="https://docs.langchain.com/oss/python/integrations/vectorstores" target="_blank" rel="noopener">vector store</a> to perform a similarity search with embeddings provided by OpenAI. Documents are embedded so the doc search tool can perform similarity searches to fetch the relevant documents when called.&nbsp;</p>



<pre class="EnlighterJSRAW" data-enlighter-language="python" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">def main():

   parser = argparse.ArgumentParser(

       description="Ask PyCharm docs via an Agent (FAISS + GPT-5)"

   )

   parser.add_argument("question", type=str, nargs="+", help="Your question")

   parser.add_argument(

       "--k", type=int, default=6, help="Number of documents to retrieve"

   )

   args = parser.parse_args()

   question = " ".join(args.question)

   system_prompt = """You are a helpful assistant that answers questions about JetBrains PyCharm using the provided tools.

   Always consult the 'pycharm_docs_search' tool to find relevant documentation before answering.

   Cite sources by including the 'Source:' lines from the tool output when useful. If information isn't found, say you don't know."""

   agent = create_agent(

       model=settings.openai_chat_model,

       tools=[pycharm_docs_search],

       system_prompt=system_prompt,

       response_format=ToolStrategy(ResponseFormat),

   )

   result = agent.invoke({"messages": [{"role": "user", "content": question}]})

   print(result["structured_response"].content)</pre>



<p>&nbsp;</p>



<p>System prompts are provided to the LLM together with the user’s input prompt. We are using OpenAI as the LLM provider in this example, and we’ll need an API key from them. Head to <a href="https://docs.langchain.com/oss/python/integrations/chat/openai" target="_blank" rel="noopener">this page</a> to check out OpenAI’s integration documentation. When creating an agent, we’ll have to configure the settings for `llm`, `tools`, and `prompt`.</p>



<p>For the full scripts and project, see <a href="https://github.com/Cheukting/langchain-example1" target="_blank" rel="noopener">here</a>.</p>



<h3 class="wp-block-heading" id="content-generation-tools">Content generation tools</h3>



<p>Another example is an agent that generates text based on content fetched from other sources. For instance, we might use this when we want to generate marketing content with info taken from documentation. In this example, we’ll pretend we’re doing marketing for Python and creating a newsletter for the latest Python release.</p>



<p>In <a href="https://github.com/Cheukting/langchain-example2/blob/main/app/tools.py" target="_blank" rel="noopener">tools.py</a>, a tool is set up to fetch the relevant information, parse it into a structured format, and extract the necessary information.</p>



<pre class="EnlighterJSRAW" data-enlighter-language="python" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">@tool("fetch_python_whatsnew", return_direct=False)

def fetch_python_whatsnew() -> str:

   """

   Fetch the latest "What's New in Python" article and return a concise, cleaned

   text payload including the URL and extracted section highlights.

   The tool ignores the input argument.

   """

   index_html = _fetch(BASE_URL)

   latest = _find_latest_entry(index_html)

   if not latest:

       return "Could not determine latest What's New entry from the index page."

   article_html = _fetch(latest.url)

   highlights = _extract_highlights(article_html)

   return f"URL: {latest.url}\nVERSION: {latest.version}\n\n{highlights}"</pre>



<p>As for the agent in <a href="https://github.com/Cheukting/langchain-example2/blob/main/app/agent.py" target="_blank" rel="noopener">agent.py</a>.&nbsp;</p>



<pre class="EnlighterJSRAW" data-enlighter-language="python" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">SYSTEM_PROMPT = (

   "You are a senior Product Marketing Manager at the Python Software Foundation. "

   "Task: Draft a clear, engaging release marketing newsletter for end users and developers, "

   "highlighting the most compelling new features, performance improvements, and quality-of-life "

   "changes in the latest Python release.\n\n"

   "Process: Use the tool to fetch the latest 'What's New in Python' page. Read the highlights and craft "

   "a concise newsletter with: (1) an attention-grabbing subject line, (2) a short intro paragraph, "

   "(3) 4–8 bullet points of key features with user benefits, (4) short code snippets only if they add clarity, "

   "(5) a 'How to upgrade' section, and (6) links to official docs/changelog. Keep it accurate and avoid speculation."

)

...

def run_newsletter() -> str:

   load_dotenv()

   agent = create_agent(

       model=os.getenv("OPENAI_MODEL", "gpt-4o"),

       tools=[fetch_python_whatsnew],

       system_prompt=SYSTEM_PROMPT,

       # response_format=ToolStrategy(ResponseFormat),

   )

...</pre>



<p>As before, we provide a system prompt and the API key for OpenAI to the agent.</p>



<p>For the full scripts and project, see <a href="https://github.com/Cheukting/langchain-example2" target="_blank" rel="noopener">here</a>.</p>



<h2 class="wp-block-heading">Advanced LangChain concepts</h2>



<p>LangChain’s more advanced features can be extremely useful when you’re building a more sophisticated AI agent. Not all AI agents require these extra elements, but they are commonly used in production. Let’s look at some of them.</p>



<h3 class="wp-block-heading">MCP adapter</h3>



<p>The MCP (Model Context Protocol) allows you to add extra tools or functionalities to an AI agent, making it increasingly popular among active AI agent users and AI enthusiasts alike.&nbsp;</p>



<p>LangChain’s Client module provides a <a href="https://reference.langchain.com/python/langchain_mcp_adapters/" target="_blank" rel="noopener">MultiServerMCPClient</a> class that allows the AI agent to accept MCP server connections. For example:</p>



<pre class="EnlighterJSRAW" data-enlighter-language="python" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">from langchain_mcp_adapters.client import MultiServerMCPClient

client = MultiServerMCPClient(

   {

       "postman-server": {

          "type": "http",

          "url": "https://mcp.eu.postman.com",

           "headers": {

               "Authorization": "Bearer ${input:postman-api-key}"

           }

       }

   }

)

all_tools = await client.get_tools()</pre>



<p>The above connects to the <a href="https://www.postman.com/postman/postman-public-workspace/collection/681dc649440b35935978b8b7" target="_blank" rel="noopener">Postman MCP server in the EU</a> with an API key.</p>



<h3 class="wp-block-heading">Guardrails</h3>



<p>As with many AI technologies, since the logic is not pre-determined, the behavior of an AI agent is non-deterministic. Guardrails are necessary for managing AI behavior and ensuring that it is policy-compliant.</p>



<p>LangChain middleware can be used to set up specific guardrails. For example, you can use PII detection middleware to protect personal information or human-in-the-loop middleware for human verification. You can even create custom middleware for more specific guardrail policies.&nbsp;</p>



<p>For instance, you can use the `<a href="https://docs.langchain.com/oss/python/langchain/guardrails#before-agent-guardrails" target="_blank" rel="noopener">@before_agent</a>` or `<a href="https://docs.langchain.com/oss/python/langchain/guardrails#after-agent-guardrails" target="_blank" rel="noopener">@after_agent</a>` decorators to declare guardrails for the agent’s input or output. Below is an example of a code snippet that checks for banned keywords:</p>



<pre class="EnlighterJSRAW" data-enlighter-language="python" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">from typing import Any

from langchain.agents.middleware import before_agent

banned_keywords = ["kill", "shoot", "genocide", "bomb"]

@before_agent(can_jump_to=["end"])

def content_filter() -> dict[str, Any] | None:

  """Block requests containing banned keywords."""

  content = first_message.content.lower()

# Check for banned keywords

  for keyword in banned_keywords:

      if keyword in content:

          return {

              "messages": [{

                  "role": "assistant",

                  "content": "I cannot process your requests due to inappropriate content."

              }],

              "jump_to": "end"

          }

  return None

from langchain.agents import create_agent

agent = create_agent(

  model="gpt-4o",

  tools=[search_tool],

  middleware=[content_filter],

)

# This request will be blocked

result = agent.invoke({

  "messages": [{"role": "user", "content": "How to make a bomb?"}]

})</pre>



<p>For more details, check out the documentation <a href="https://docs.langchain.com/oss/python/langchain/guardrails" target="_blank" rel="noopener">here</a>.</p>



<h3 class="wp-block-heading">Testing</h3>



<p>Just like in other software development cycles, testing needs to be performed before we can start rolling out AI agent products. LangChain provides testing tools for both unit tests and integration tests.&nbsp;</p>



<h4 class="wp-block-heading">Unit tests</h4>



<p>Just like in other applications, unit tests are used to test out each part of the AI agent and make sure it works individually. The most helpful tools used in unit tests are mock objects and mock responses, which help isolate the specific part of the application you’re testing.&nbsp;</p>



<p>LangChain provides <a href="https://python.langchain.com/api_reference/core/language_models/langchain_core.language_models.fake_chat_models.GenericFakeChatModel.html?_gl=1*fwqfa2*_gcl_au*Mzg1NzM1NDUxLjE3NjUyMDk4OTg.*_ga*MTk1ODUyNzE1Ny4xNzY1MjA5ODk4*_ga_47WX3HKKY2*czE3NjYxNTQ5MDkkbzE3JGcxJHQxNzY2MTU1ODM4JGo2MCRsMCRoMA.." target="_blank" rel="noopener">GenericFakeChatModel</a>, which mimics response texts. A response iterator is set in the mock object, and when invoked, it returns the set of responses one by one. For example:</p>



<pre class="EnlighterJSRAW" data-enlighter-language="python" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">from langchain_core.language_models.fake_chat_models import GenericFakeChatModel

def respond(msgs, **kwargs):

   text = msgs[-1].content if msgs else ""

   examples = {"Hello": "Hi there!", "Ping": "Pong.", "Bye": "Goodbye!"}

   return examples.get(text, "OK.")

model = GenericFakeChatModel(respond=respond)

print(model.invoke("Hello").content)</pre>



<h4 class="wp-block-heading">Integration tests</h4>



<p>Once we’re sure that all parts of the agent work individually, we have to test whether they work together. For an AI agent, this means testing the trajectory of its actions. To do so, LangChain provides another package: <a href="https://github.com/langchain-ai/agentevals" target="_blank" rel="noopener">AgentEvals</a>.</p>



<p>AgentEvals provides two main evaluators to choose from:</p>



<ol>
<li>Trajectory match – A reference trajectory is required and will be compared to the trajectory of the result. For this comparison, you have <a href="https://docs.langchain.com/oss/python/langchain/test#trajectory-match-evaluator" target="_blank" rel="noopener">4 different models</a> to choose from.</li>



<li>LLM judge – An <a href="https://docs.langchain.com/oss/python/langchain/test#llm-as-judge-evaluator" target="_blank" rel="noopener">LLM judge</a> can be used with or without a reference trajectory. An LLM judge evaluates whether the resulting trajectory is on the right path.</li>
</ol>



<h2 class="wp-block-heading">LangChain support in PyCharm</h2>



<p>With LangChain, you can develop an AI agent that suits your needs in no time. However, to be able to effectively use LangChain in your application, you need an effective debugger. In PyCharm, we have the <a href="https://plugins.jetbrains.com/plugin/26921-ai-agents-debugger" target="_blank" rel="noopener">AI Agents Debugger plugin</a>, which allows you to power up your experience with LangChain.</p>



<p>If you don’t yet have PyCharm, <a href="https://www.jetbrains.com/pycharm/download/" target="_blank" rel="noopener">you can download it here</a>.</p>



<p>Using the AI Agents Debugger is very straightforward. Once you install the plug-in, it will appear as an icon on the right-hand side of the IDE.</p>



<figure class="wp-block-image size-full"><img style="width:100% !important; height:auto !important; max-width:100% !important;" decoding="async" loading="lazy" src="https://blog.jetbrains.com/wp-content/uploads/2026/02/image-27.png" alt="" class="wp-image-681666"/></figure>



<p>When you click on this icon, a side window will open with text saying that no extra code is needed – just run your agent and traces will be shown automatically.</p>



<p>As an example, we will run the <a href="https://github.com/Cheukting/langchain-example2" target="_blank" rel="noopener">content generation agent</a> that we built above. If you need a custom run configuration, you will have to set it up now by following this guide on <a href="https://www.jetbrains.com/help/pycharm/run-debug-configuration.html" target="_blank" rel="noopener">custom run configurations in PyCharm</a>.</p>



<figure class="wp-block-image size-full"><img style="width:100% !important; height:auto !important; max-width:100% !important;" decoding="async" loading="lazy" src="https://blog.jetbrains.com/wp-content/uploads/2026/02/image-27.png" alt="" class="wp-image-681676"/></figure>



<p>Once it is done, you can review all the input prompts and output responses at a glance. To inspect the LangGraph, click on the <em>Graph</em> button in the top-right corner.</p>



<figure class="wp-block-image size-full"><img style="width:100% !important; height:auto !important; max-width:100% !important;" decoding="async" loading="lazy" src="https://blog.jetbrains.com/wp-content/uploads/2026/02/image-27.png" alt="" class="wp-image-681673"/></figure>



<p>The <em>LangGraph </em>view is especially useful if you have an agent that has complicated steps or a customized workflow.</p>



<h2 class="wp-block-heading">Summing up</h2>



<p>LangChain is a powerful tool for building AI agents that work for many use cases and scenarios. It’s built on <a href="https://docs.langchain.com/oss/python/langgraph/overview" target="_blank" rel="noopener">LangGraph</a>, which provides low-level orchestration and runtime customization, as well as compatibility with a vast variety of LLMs on the market. Together, LangChain and LangGraph set a new industry standard for developing AI agents.</p>
]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>Python Unplugged on PyTV – A Free Online Python Conference for Everyone </title>
		<link>https://blog.jetbrains.com/pycharm/2026/02/python-unplugged-on-pytv/</link>
		
		<dc:creator><![CDATA[Cheuk Ting Ho]]></dc:creator>
		<pubDate>Wed, 11 Feb 2026 16:37:44 +0000</pubDate>
		<featuredImage>https://blog.jetbrains.com/wp-content/uploads/2026/02/Blog-Social-Share-image-1280x720-1.png</featuredImage>		<product ><![CDATA[education]]></product>
		<category><![CDATA[ai]]></category>
		<category><![CDATA[conference]]></category>
		<category><![CDATA[ml]]></category>
		<category><![CDATA[python]]></category>
		<guid isPermaLink="false">https://blog.jetbrains.com/?post_type=pycharm&#038;p=679943</guid>

					<description><![CDATA[The PyCharm team loves being part of the global Python community. From PyCon US to EuroPython to every PyCon in between, we enjoy the atmosphere at conferences, as well as meeting people who are as passionate about Python as we are. This includes everyone: professional Python developers, data scientists, Python hobbyists and students. However, we [&#8230;]]]></description>
										<content:encoded><![CDATA[
<figure class="wp-block-image size-full"><img style="width:100% !important; height:auto !important; max-width:100% !important;" decoding="async" loading="lazy" src="https://blog.jetbrains.com/wp-content/uploads/2026/02/Blog-Featured-1280x720-1.png" alt="" class="wp-image-680011"/></figure>



<p>The <a href="https://www.jetbrains.com/pycharm/" data-type="link" data-id="https://www.jetbrains.com/pycharm/" target="_blank" rel="noopener">PyCharm</a> team loves being part of the global Python community. From PyCon US to EuroPython to every PyCon in between, we enjoy the atmosphere at conferences, as well as meeting people who are as passionate about Python as we are. This includes everyone: professional Python developers, data scientists, Python hobbyists and students.</p>



<figure class="wp-block-image size-full"><img style="width:100% !important; height:auto !important; max-width:100% !important;" decoding="async" loading="lazy" src="https://blog.jetbrains.com/wp-content/uploads/2026/02/image-1.jpeg" alt="" class="wp-image-679944"/></figure>



<p>However, we know that being able to attend a Python conference in person is not something that everyone can do, either because they don’t have a local conference, or cannot travel to one. So within the PyCharm team we started thinking: what if we could bring the five-star experience of Python conferences to everyone? What if everyone could have the experience of learning from professional speakers, accessing great networking opportunities, hearing from various voices from across the community, and &#8211; most importantly &#8211; having fun, no matter where they are in the world?</p>



<h2 class="wp-block-heading">Python is for Everyone &#8211; Announcing Python Unplugged on PyTV!</h2>



<figure class="wp-block-embed is-type-video is-provider-youtube wp-block-embed-youtube wp-embed-aspect-16-9 wp-has-aspect-ratio"><div class="wp-block-embed__wrapper">
<iframe loading="lazy" title="Announcing &quot;Python Unplugged on PyTV&quot; – March 4, 2026" src="https://www.youtube.com/embed/8KblH4leUVA?feature=oembed" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen></iframe>
</div></figure>



<p>After almost a year of planning, we’re proud to announce we’ll be hosting the first ever PyTV &#8211; a free online conference for everyone!</p>



<figure class="wp-block-image size-full"><img style="width:100% !important; height:auto !important; max-width:100% !important;" decoding="async" loading="lazy" src="https://blog.jetbrains.com/wp-content/uploads/2026/02/image-8.png" alt="" class="wp-image-679954"/></figure>



<p>Join us on <strong>March 4th 2026</strong>, for an unforgettable, non-stop event, streamed from our studio in Amsterdam. We’ll be joined live by 15 well-known and beloved <a href="https://lp.jetbrains.com/python-unplugged/" target="_blank" rel="noopener">speakers</a> from Python communities around the globe, including Carol Willing, Deb Nicholson, Sheena O’Connell, Paul Everitt, Marlene Mhangami, and Carlton Gibson. They’ll be speaking about topics such as core Python, AI, community, web development and data science.&nbsp;</p>



<figure class="wp-block-image size-full"><img style="width:100% !important; height:auto !important; max-width:100% !important;" decoding="async" loading="lazy" src="https://blog.jetbrains.com/wp-content/uploads/2026/02/image-9.png" alt="" class="wp-image-679964"/></figure>



<p>You can get involved in the fun as well! Throughout the livestream, you can join our chat on Discord, where you can interact with other participants and our speakers. We’ve also prepared games and quizzes, with fabulous prizes up for grabs! You might even be able to get your hands on some of the super cool conference swag that we designed specifically for this event.</p>



<p><strong>What are you waiting for? <a href="https://lp.jetbrains.com/python-unplugged/" target="_blank" rel="noopener">Sign up here.</a>&nbsp;</strong></p>



<p>If you are local to Amsterdam, you can also sign up for the <a href="https://www.meetup.com/pyladiesams/" target="_blank" rel="noopener">PyLadies Amsterdam meetup</a>. It will be held on the same day as the conference, and will give you a chance to meet some of the PyTV speakers in person.</p>
]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>Google Colab Support Is Now Available in PyCharm 2025.3.2</title>
		<link>https://blog.jetbrains.com/pycharm/2026/01/google-colab-support-is-now-available-in-pycharm-2025-3-2/</link>
		
		<dc:creator><![CDATA[Ilia Afanasiev]]></dc:creator>
		<pubDate>Wed, 28 Jan 2026 09:33:49 +0000</pubDate>
		<featuredImage>https://blog.jetbrains.com/wp-content/uploads/2026/01/Colab-PC-social-BlogFeatured-1280x720-1-1.png</featuredImage>		<product ><![CDATA[jetbrains-for-data]]></product>
		<category><![CDATA[releases]]></category>
		<category><![CDATA[jupyter-notebooks]]></category>
		<guid isPermaLink="false">https://blog.jetbrains.com/?post_type=pycharm&#038;p=677006</guid>

					<description><![CDATA[PyCharm is designed to support the full range of modern Python workflows, from web development to data and ML/AI work, in a single IDE. An essential part of these workflows is Jupyter notebooks, which are widely used for experimentation, data exploration, and prototyping across many roles. PyCharm provides first-class support for Jupyter notebooks, both locally [&#8230;]]]></description>
										<content:encoded><![CDATA[
<p>PyCharm is designed to support the full range of modern Python workflows, from web development to data and ML/AI work, in a single IDE. An essential part of these workflows is Jupyter notebooks, which are widely used for experimentation, data exploration, and prototyping across many roles.</p>



<p>PyCharm provides first-class support for Jupyter notebooks, both locally and when connecting to <a href="https://www.jetbrains.com/help/pycharm/configuring-jupyter-notebook.html#add-external-server" target="_blank" rel="noopener">external Jupyter servers</a>, with IDE features such as refactoring and navigation available directly in notebooks. Meanwhile, Google Colab has become a key tool for running notebook-based experiments in the cloud, especially when local resources are insufficient.</p>



<p>With PyCharm 2025.3.2, we’re bringing local IDE workflows and Colab-hosted notebooks together. Google Colab support is now available for free in PyCharm as a core feature, along with basic Jupyter notebook support. If you already use Google Colab, you can now bring your notebooks into PyCharm and work with them using IDE features designed for larger projects and longer development sessions.</p>


    <div class="buttons">
        <div class="buttons__row">
                                                <a href="https://www.jetbrains.com/pycharm/download" class="btn" target="" rel="noopener">Download PyCharm</a>
                                                    </div>
    </div>







<h3 class="wp-block-heading">Getting started with Google Colab in PyCharm</h3>



<p>Connecting PyCharm to Colab is quick and straightforward:</p>



<ol>
<li>Open a Jupyter notebook in PyCharm.</li>



<li>Select Google Colab (Beta) from the Jupyter server menu in the top-right corner.</li>



<li>Sign in to your Google account.</li>



<li>Create and use a Colab-backed server for the notebook.</li>
</ol>



<p>Once connected, your notebook behaves as usual, with navigation, inline outputs, tables, and visualizations rendered directly in the editor.</p>



<figure class="wp-block-video"><video controls src="https://blog.jetbrains.com/wp-content/uploads/2026/01/pycharm-colab.mp4"></video></figure>



<h3 class="wp-block-heading">Working with data and files&nbsp;</h3>



<p>When your Jupyter notebook depends on files that are not yet available on the Colab machine, PyCharm helps you handle this without interrupting your workflow. If a file is missing, you can upload it directly from your local environment. The remote file structure is also visible in the <em>Project</em> tool window, so you can browse directories and inspect files as you work.</p>



<p>Whether you’re experimenting with data, prototyping models, or working with notebooks that outgrow local resources, this integration makes it easier to move between local work, remote execution, and cloud resources without changing how you work in PyCharm.</p>



<p>If you’d like to try it out:</p>



<ul>
<li><a href="https://www.jetbrains.com/pycharm/download/" target="_blank" rel="noopener">Download PyCharm 2025.3.2</a></li>



<li>Learn more about <a href="https://developers.google.com/colab" target="_blank" rel="noopener">Google Colab</a></li>
</ul>
]]></content:encoded>
					
		
		
			</item>
	</channel>
</rss>
