Cristian

Artificial Content Generation

noemail@noemail.org (Cristian Brokate) — Sat, 7 Dec 2024 00:00:00 GMT

In today’s digital age, creating artificial content has become simpler than ever before, thanks to advancements in technology and AI tools. This type of strategy can be an effective way to capture the attention of users and convey a particular message or idea.

For example, businesses and marketers can use AI-generated images or videos to create engaging content that stands out on social media platforms or other digital channels. These visuals can be tailored to specific audiences or trends, making them more likely to resonate with users and generate engagement.

However, it’s important to note that while artificial content can be effective in capturing attention, it should always be used ethically and responsibly. Businesses should ensure that their use of AI tools is transparent and clearly disclosed to users, and that they are not attempting to deceive or mislead their audience in any way.

Furthermore, while artificial content can be an effective tool for capturing attention, it’s only one piece of the puzzle when it comes to creating a successful digital marketing strategy. Businesses should also focus on creating high-quality content that provides value and relevance to their audience, as well as optimizing their site’s user experience and overall performance.

Text Generation

When it comes to utilizing large language models (LLMs), there are typically two routes you can take: self-hosted models or available APIs. Self-hosting allows for greater customization and control, but also requires more technical expertise and resources. On the other hand, using an API provided by a service like OpenAI is often more convenient, as they offer a pre-built wrapper to call different models. One such example is OpenAI’s API, which enables users to easily access various LLMs with a single interface.

Another option for those looking to utilize LLMs without the need for self-hosting or paying for access is Groq.ai, which offers a free model for use. This can be an attractive option for those who want to experiment with LLMs without incurring any costs.

Once you have selected your preferred method of accessing an LLM, the following Python code example demonstrates how to query the model and obtain a text response:

client = OpenAI(
    # Groq API
    base_url="https://api.groq.com/openai/v1",
    api_key=GROQ_API_KEY
    # local instance
    # base_url = 'http://localhost:11434/v1',
)

def get_message(publication_date):
    content = (
        "Generate content for a web post about NERF the output has to be in markdown format, "
        "the text starts with a title, a number signs (#), then a newline sign (\n), for example \'# Celebrate Winter with Nerf\n\', "
        f"take into account important events that happens on {publication_date} or during that week or month"
    )
    response = client.chat.completions.create(
        model="llama-3.3-70b-versatile",
        messages=[
        {"role": "system", "content": "You are a helpful assistant that writes content."},
        {"role": "user", "content": content},
      ]
    )
    return response.choices[0].message.content

If you want to generate text based on a specific date or day of the year, you can use a language model to generate a unique output for each input. Here’s an example of how to accomplish this using Python and the popular Pandas library:

n_posts = 100
df = (
  pd.DataFrame()
  .assign(
    date = (
      pd.to_datetime(
          pd.Series([
            datetime.combine(date(2024,1,1), time(0)) \
            + timedelta(days=i*3.5 + random.random() * 1, hours=random.random()*24) \
            for i in range(n_posts)
          ])
      )
      .dt.strftime('%Y-%m-%d %H:%M')
    ),
    message = lambda x: (
      x['date'].progress_apply(lambda x: \
        get_message(pd.to_datetime(x).strftime('%B %-dth'))
        )
    )
  )
)

Below are some results of the generated text. Is interesting to notice that the models are able to relate date with events like winter, women’s day, Halloween, between others.

january: 'Get Ready to NERF This Winter Season!',
february: 'Celebrate Love and NERF this February',
march: "Celebrate International Women's Day with NERF",
april: 'Celebrate Earth Day with NERF',
may: 'NERF: Celebrating Community and Creativity in May',
june: 'Celebrate Summer with NERF: Fun and Games in June!',
july: '7 Epic Ways to NERF Your Way Through Summer Fun',
august: 'Celebrate National Watermelon Day with NERF Blasters!',
september: 'Celebrate World Noodle Day with NERF',
october: 'Make your Halloween Party Spook-tacular with NERF!',
november: 'Experience Winter Magic with NERF this November!',
december: 'Celebrate Winter with NERF: Fun Activities to Enjoy During the Holiday Season!'

Image generation

There are various API-based language models and self-hosted solutions available for generating images from text inputs. These tools offer powerful capabilities for content creation and can be used for a wide range of applications, such as generating product images, creating art, and visualizing data.

One popular option is the Midjourney application, which offers endpoints for generating images from text using stable diffusion models. There are developer friendly API like StabilityAI which allow to use generate image using simple python code. Users can input text and receive an image in response, making it easy to create custom graphics for websites, social media, and other digital platforms. Additionally, solutions like Automatic 1111 or SwarmUI allows to self-host a model for more advanced customization and control over the model architecture and training process.

OpenAI also offers a powerful API for generating images from text inputs. Their text2image endpoint takes a text prompt as input and generates a corresponding image using state-of-the-art machine learning algorithms. OpenAI’s API is widely used in research and industry and has been applied to a variety of applications, such as generating realistic product images for e-commerce sites and creating custom artwork for digital marketing campaigns.

Another interesting application of these tools is image2image translation, where the input is an image instead of text. By providing some instructions or guidance to the model, users can determine the output image result. For example, a user might provide an image of a landscape and ask the model to add a sunset or change the season. This capability has many potential applications in fields such as graphic design, gaming, and virtual reality.

def gen_image(prompt, output_name, seed):
    prompt = "(best quality:1.2), (masterpiece:1.2) (realistic:1.2), (intricately detailed:1.1) " +  prompt
    payload = {
        "prompt": prompt,
        "seed": seed,
        "batch_size": 1,
        "width": 1024,
        "height": 1024,
        "steps" : 50,
        "hr_scale": 2,
        "refiner_switch_at": 0.8,
        "refiner_checkpoint": "sd_xl_refiner_1.0.safetensors",
        "negative_prompt": "bad quality, blur, anime, cartoon, graphic, text, painting, crayon, graphite, abstract, glitch, deformed, mutated, ugly, disfigured",
    }

    # For StabilityAI
    # url = f"https://api.stability.ai/v1/generation/{engine_id}/text-to-image",
    # For self hosted AUTOMATIC1111
    # url = http://localhost:7860/sdapi/v1/txt2img
    response = requests.post(url="url", json=payload)
    r = response.json()

    if 'images' in r:
        # Decode and save the image.
        with open(f"images/raw/{output_name}.png", 'wb') as f:
            f.write(base64.b64decode(r['images'][0]))

When generating images from text inputs, it’s important to keep in mind that the results may not always match your expectations. While language models can capture relevant information and use it to generate coherent and contextually appropriate text or images, they are not perfect and may sometimes produce unexpected output.

For example, when attempting to generate cute content containing little puppies and Nerf toys using a simple prompt, the results may vary depending on the specific language model used and the input provided. While some models may be able to capture the desired concept and generate images that include both puppies and Nerf toys, others may only incorporate one or the other, or produce output that is unrelated to the prompt altogether.

To increase the likelihood of generating images that match your desired concept, it’s important to provide clear and specific instructions to the language model. This might involve breaking down the concept into smaller components or providing multiple prompts to ensure that all relevant elements are included. Additionally, experimenting with different models and input parameters can help improve the quality and consistency of the generated images.

It’s also worth noting that language models may not always be able to capture the nuances of human creativity and imagination. While they can generate coherent and contextually appropriate text or images based on a given prompt, they may struggle to produce truly unique or unexpected content. Nonetheless, these tools can be a powerful resource for generating creative and engaging content, particularly when used in conjunction with other design and editing tools.

Prompt: On nov 22nd a puppy dodge Nerf blast in the cool autumn air

Prompt: On Feb 19th a mischievous puppy evades multiple Nerf blaster shots in the snow covered park

To improve the quality of generated images using language models, attention emphasis can be employed. This technique involves adjusting the input prompt to emphasize or de-emphasize certain elements, resulting in output that better matches the desired concept.

One approach is to reorder the words in the input prompt, with those appearing first having the greatest impact on the generated image. For example, if generating an image of a dog playing with a ball, placing “dog” before “ball” in the prompt may result in an image that features the dog more prominently than the ball. This method is highly flexible and can be used to emphasize any element of the prompt, but it does not lend itself to algorithmic modification, as changing the order of words requires manual input.

Another approach is to use parenthetical tokens to adjust the attention by a given amount. For example, “(dog:2)” might result in an image that features the dog more prominently than other elements of the prompt, while “(ball:-1)” might de-emphasize the ball. This method allows for a great deal of nuance and fine-tuning, but it comes with some caveats. Specifically, using too many parenthetical tokens or values that are too large can introduce artifacts in the generated image, making it less coherent or realistic.

It’s also possible to use extra parentheses to strengthen a subject or brackets to weaken it instead of providing a value. For example, “(dog)” might result in an image that features the dog more prominently than other elements of the prompt, while “[ball]” might de-emphasize the ball. However, this method can also introduce artifacts if used excessively or with large values.

Prompt: A playful puppy dodges colourful ((Nerf)) darts in the back of a garden

Prompt: On a afternoon of September a puppy plays with a ((Nerf)) blaster

Writing markdown text

When generating content using language models, it’s often desirable to format that content for inclusion on a static website generator. One way to do this is by converting the content to Markdown format, which allows for easy formatting and customization.

Markdown is a lightweight markup language that enables users to add formatting such as headers, lists, and links using simple syntax. By converting generated content to Markdown format, users can ensure that it displays consistently across different platforms and devices, making it ideal for use on static website generators.

To convert content to Markdown format, users can simply wrap the text in Markdown syntax, such as using “#” for headers or “-“ for bullet points. This enables easy formatting and customization of the generated content, allowing users to add links, images, and other elements as needed.

Additionally, many language models offer built-in support for generating Markdown-formatted text directly, eliminating the need for manual conversion. By specifying the desired output format as Markdown, users can generate content that is ready for inclusion on a static website generator with minimal additional formatting required.

def get_header(title, post_date, coverimage):
    header = f"""---
title: "{title}"
date: "{post_date}"
updated: "{post_date}"
categories:
  - "nerfs"
coverImage: "/images/posts/{coverimage}"
coverWidth: 16
coverHeight: 16
excerpt: Check out how heading links work with this starter in this post.
---
"""
    base_dep = """

"""
    return header + base_dep

def write_markdown(idx, title, post_date, message, coverimage):
    markdown_text = get_header(title, post_date, coverimage.replace('-', '') + "_1.jpg") + message
    with open(f"../src/lib/posts/{coverimage}.md", 'w') as f:
        f.write(markdown_text)

Once generated content has been processed using language models and formatted for display, it can be rendered in HTML format for use on static website generators or other platforms. This involves converting the content into HTML code that includes various elements such as headers, paragraphs, and images to create a visually appealing layout.

To capture the viewer’s attention and provide context for the content, it’s common to include a main image at the beginning of the generated content. This might involve selecting an image that is both relevant to the topic and visually engaging, as well as ensuring that it displays correctly on different devices and screen sizes.

Following the main image, text content can be developed with additional images included throughout the text to break up the content and provide visual interest. Including multiple images in this way helps to keep the viewer engaged and interested in the generated content, while also providing context and information through accompanying text.

When creating HTML code from generated content, it’s important to consider factors such as responsive design, which ensures that the layout adapts to different screen sizes and devices. Additionally, using semantic markup can help search engines understand the structure and meaning of the content, improving its visibility and discoverability.

Nerf application webpage, the website is 👉 here

Twitter offers a free API (Application Programming Interface) that enables users to programmatically post messages to the platform. This API is a powerful tool for developers, as it provides access to various features and functionalities of Twitter’s platform.

One key feature of the Twitter API is its rate limit, which specifies the number of requests that can be made within a given time period. For the free version of the API, this rate limit is set to 17 requests per hour. This means that developers must carefully manage their use of the API to avoid exceeding the limit and facing restrictions or penalties.

To make the most of the Twitter API’s rate limit, it’s important to optimize requests by combining multiple actions into a single request where possible. Additionally, scheduling requests during off-peak hours can help ensure that the rate limit is not exceeded and that messages are posted successfully.

In some cases, developers may need to upgrade to a paid version of the Twitter API to access higher rate limits and more advanced features. However, for many use cases, the free version of the API provides sufficient functionality and flexibility.

Remeber to configure the writing permissions in order to use the posting endpoints of the API.

X developper app permissions

To increase visibility and reach on social media platforms, it’s important to stay up-to-date with current trends and topics that are popular among users. One strategy for doing this is to use trending subjects as inspiration for creating content, such as short messages or posts that incorporate relevant keywords and hashtags.

For example, the following code demonstrates how to create short messages using Google Trends subjects:

from pytrends.request import TrendReq

pytrend = TrendReq()
df = pytrend.trending_searches(pn='france').rename(columns={0: 'daily trends'})
df.head()

def gen_prompt(subject):
    return (
        f"Describe a controversial situation about {subject}, involving nerf blasters, shooting nerf darts.  "
    )
df['message'] = (
    df['daily trends']
    .progress_apply(lambda x: get_message(
       prompt=gen_prompt(x),
       assistant_instructions=(
            "You are a helpful assistant that writes prompts to generate realistic images. "
            "Use only simple words, no hashtags. Detailled description of the situation but keep it short, no more than 150 characters."
       )
     )
   )
)

Once content has been generated using trending subjects or other sources, it’s important to distribute and promote that content on relevant social media platforms. This can help increase visibility, engagement, and reach among target audiences.

The following code demonstrates how to use the Twitter API to post a message with an image:

import tweepy

def post_on_twitter(tweet, image=None):
    auth = tweepy.OAuthHandler(api_key, api_key_secret)
    auth.set_access_token(access_token, access_token_secret)
    api = tweepy.API(auth, wait_on_rate_limit=True)
    client = tweepy.Client(
            bearer_token=bearer_token,
            consumer_key=api_key,
            consumer_secret=api_key_secret,
            access_token=access_token,
            access_token_secret=access_token_secret
            )

    # Upload image
    if image is not None:
        media = api.media_upload(image)
        # Create a tweet
        post_result = api.update_status(status=tweet, media_ids=[media.media_id])
    else:
        post_result = api.update_status(status=tweet)
    print("Tweet posted successfully.")

Here are some examples of the content that can be produced. While AI-generated images may not be perfect, they can still be effective in capturing attention and conveying a message. However, there may be some small artifacts or imperfections that suggest the image is not entirely real, such as excessive numbers of limbs or unusual body positions.

Despite these minor imperfections, AI-generated images can still be an effective tool for content creation and distribution on social media platforms. By leveraging tools like DALL-E, brands and marketers can quickly generate high-quality visuals that help capture attention and convey a message in a unique and engaging way.

Artifical Traffic Augmentation

To improve website referencing and increase search engine rankings, there are several strategies that businesses and marketers can employ. One of the most effective is to focus on driving traffic to the website using reputable search engines like Google or Bing. By increasing the volume and quality of traffic to the site, businesses can signal to search engines that their content is valuable and relevant to users.

One strategy for simulating traffic to a website is to use Python tools like Selenium. Selenium is an open-source web automation framework that allows developers to control web browsers programmatically and automate tasks. With Selenium, businesses can execute analytics code like Matomo or Google Analytics, which can help track user behavior and provide insights into how to improve the site’s performance and user experience.

One of the key benefits of using Selenium for web automation is its ability to execute client-side content like JavaScript. This can be particularly useful for websites that rely heavily on JavaScript for functionality or dynamic content. By controlling a web browser programmatically, Selenium can simulate real user behavior and help ensure that all aspects of the site are functioning properly.

In addition to improving website referencing and analytics, Selenium can also be used for a variety of other tasks related to web automation. For example, businesses can use Selenium to automate repetitive tasks like form filling or data entry, which can help save time and reduce errors. They can also use Selenium to test their website’s functionality across different browsers and devices, ensuring that it is accessible and user-friendly for all visitors.

import time
from selenium import webdriver
while True:
    driver = webdriver.Firefox()
    driver.get("https://url.com")
    scheight = .1
    while scheight < 9.9:
        driver.execute_script("window.scrollTo(0, document.body.scrollHeight/%s);" % scheight)
        scheight += .01
    time.sleep(10)
    driver.close()
    time.sleep(2)

Tracking the analytics of a website is an essential task for businesses and marketers looking to optimize their online presence and drive more traffic to their site. There are many tools available for tracking website analytics, but one of the most popular and widely used is Google Analytics. By integrating Google Analytics into their website, businesses can gain valuable insights into user behavior, demographics, and engagement metrics.

The following image shows an example of how simulated traffic can impact website metrics in Google Analytics. In this example, the traffic to the site increased from zero to several thousand visitors within a short period of time, thanks to the use of a web automation tool like Selenium. This type of sudden increase in traffic can be a strong signal to search engines that the site’s content is valuable and relevant to users, which can help improve its search engine rankings and visibility.

It’s important to note, however, that simulated traffic should be used responsibly and ethically. While it can be an effective way to improve website analytics and search engine rankings, it’s not a substitute for genuine user engagement and interaction. Businesses should focus on creating high-quality content and optimizing their site’s user experience to encourage real users to visit and engage with their site.

In addition to tracking traffic and user behavior, Google Analytics can also provide valuable insights into other metrics like bounce rate, conversion rate, and demographics. By analyzing these metrics over time, businesses can identify trends and patterns in user behavior and make data-driven decisions about how to improve their site’s performance and user experience.

Google analytics results with traffic increase

Conclusion

The strategies outlined earlier in this conversation can be useful for creating artificial content, which can be employed for a variety of purposes, including commercial use or spreading awareness about a particular ideology. However, it’s important to be aware that such content exists in the digital realm and to exercise caution when engaging with it.

With the rise of AI tools and other technology, generating artificial content has become more accessible than ever before. While this can present opportunities for businesses and marketers looking to create engaging content, it also raises ethical concerns about transparency and honesty in digital communication.

It’s crucial for users to be able to distinguish between what is real and what is not when interacting with digital content. This means that businesses and marketers should be transparent about their use of AI tools and ensure that their messaging is clear and accurate.

Moreover, while artificial content can be an effective tool for capturing attention, it’s only one aspect of a successful digital marketing strategy. Businesses should also focus on creating high-quality content that provides value and relevance to their audience, as well as optimizing their site’s user experience and overall performance.

Application revenue prediction

noemail@noemail.org (Cristian Brokate) — Mon, 7 Oct 2024 00:00:00 GMT

Introduction

In the competitive landscape of the digital applications industry, maximizing revenue generation is a paramount objective. This exercise aims to develop a predictive model capable of forecasting application-generated revenue by leveraging a combination of user features and engagement metrics. By understanding user characteristics and their interactions with the application, we can gain valuable insights into revenue potential and optimize monetization strategies.

Data

The dataset provides information about users application installation features and also the engagement features through 120 days. The dataset is divided in 2 parts:

The train data, which includes 1.7 millions of lines.
The test data, which includes 12.5 thousands of lines.

Each line contains information about an user. The train dataset contains information of users that installed the application from February until the end of November.The test dataset contains information from users that installed the application the 1st of December.

Individual user features

These informations are gather the first day that the user installs the application.

Install date

This represents the day when the users install the application. We can see that there has been a tendency increase from February until December. There are some periods where the installations have been higher than other like in the first part of May or October. However there are some outlayer days, where the installations have been extremely low, like August 16th, September 22th, or November 2nd, 4th or 12th.

One can also notice that the install day has a weekly seasonality, more install occurs during the weekend.

Platform

In this dataset applications are mainly installed in IOS systems.

Personalised ads

This represents whether the user opted in for personalized ads or other services. The train dataset have a higher number of users that didn’t choose for personalized ads. In the test dataset, the repartition is almost at the same level.

App Id

There are two application that have been installed. Both in the train and the test set.

Country

This represents the country where the app has been downloaded. We can see that the installations are mainly in the US market.

Ad Network

This represents the ID of the ad network that displayed the ads to the user. There are 3 main ad networks in the train and the test set.

Campaign type

In marketing terms, this is category of the ad campaign that was used to acquired the user.

Campaign id

There are more than 178 campaign_ids, counting some of them which are unique.

index	campaign_id	count
0	...	559244
1	da2...	317648
2	99a...	203638
3	c6d...	114357
4	281...	70904
...	...	...
174	blo...	1
175	gam...	1
176	blo...	1
177	dow...	1
178	ぶろっ...	1

A common technique to deal with this behaviour is to group the less common ones. This can be achieved with the following python function.

def get_most_popular(series, threshold=None, top_values=None):
    if threshold is not None:
        popular_idx = series.value_counts().to_frame().query('count > @threshold').index.tolist()
        new_series = series.copy()
        new_series.loc[:] = "other"
        new_series.loc[series.isin(popular_idx)] = series
        return new_series
    if top_values is not None:
        popular_idx = series.value_counts().to_frame().head(top_values).index.tolist()
        new_series = series.copy()
        new_series.loc[:] = "other"
        new_series.loc[series.isin(popular_idx)] = series
        return new_series

The threshold can be changed in order to have a number of categories that are representative for a group.

	campaign_id	count
0	...	559244
1	da...	317648
2	99...	203638
3	c6...	114357
4	28...	70904
...	...	...
96	20...	11
97	ブロ...	11
98	MA...	10
99	xx...	9
100	17...	8

Model

The model of the devices also contains several occurrences, with ids that sometimes are unique, so applying the same grouping function, one could obtain the top 100 models. Following the platform repartition, we find Iphones and Ipads in the top devices. One should notice that IPhoneUnkown and IPadUnknown are not in test set.

Manufacturer

This represents the manufacturer of the user’s device. The main manufacturer is Apple, followed by Samsung and Google. The repartition of manufacturer follows the same distribution for the train and the test datasets.

Mobile classification

This classification can be related with monetary value of the mobile. High end devices have the best classification Tier 1, like the last Iphone or Iphone. The cheapest telephones will be on Tier 5. Some of them doesn’t have classification.

City

The city where the user downloaded the app. The most popular cities are Tokyo, Chicago and Houston. There are come Japanese cities that doesn’t appear in the test dataset like Otemae, Tacoma or Nishikicho.

Other variables

There are some variables that are not useful for revenue predictions such as ` game_type and user_id` since they are all unique values.

User engagement features

During the life of the application, the user might interact by clicking through ads or by buying things on the platform. These features are measures using a cumulative value for day: 0, 3, 7, 14, 30, 60, 90 and 120.

Revenues generated by the application

The main revenues are measured by the variable dX_rev, where X is the number of days where it’s measured. This is variable that is going to be predicted.The distribution is similar for test and train dataset. This variable is treated in log form since it variates mostly between 0 and 1. 70.0% of users produces less than $1 of revenue.

This variable is decomposed in dX_iap_rev and dX_ad_rev which are the revenues for ads and purchased items respectively. It is possible to notice that the total revenue is mainly pushed by ads.

Correlation of values

The correlation matrix allows to find correlated features.

The following variables are correlated:

“iap_ads_rev_d0” and “iap_ads_count_d0”: Which means that the revenue from ads in correlated with the number of ads.
“iap_coins_count_d0” and “iap_count_d0”: Which means that the number of coins bought by the user is correlated with the number of items bought by the user.
“iap_coins_rev_d0” and “d0_iap_rev” and “d0_rev”: Which means that revenue from coin purchases is correlated with the revenue from in app purchases and the total revenue.

The correlated values can be removed from the modelling stage.

Evolution over time

The cumulate value changes during the days. However, in the following figure, we can notice that the revenue distribution remains the same. The distribution reduces because there is less data with d120_rev.

Missing values

The objective of the problem is to predict revenue on the day 120, but this value is available for every line in the dataset because we don’t have the information from the future. For instance, the test dataset is from 1 December, so we only have information for day zero.

Modelling

To predict revenue at day 120, a traditional approach can be employed using structured data as the training dataset and the d120_rev column as the target variable. This approach leverages the existing information in the dataset to forecast future revenue.

Feature selection

Based on variable exploration, the following modifications were made to user installation features:

Categorical values such as campaign_id, model, manufacturer, and city were consolidated by grouping less frequent categories together.
Utilizing the installation_date for each user, additional features were derived, including the number of installations per day, the day of the week, and the month.
To highlight the distinct nature of Apple devices, a binary column was introduced to indicate whether the installation was made on an Apple device.

X = (
    df
    .assign(
        campaign_id=lambda x: get_most_popular(x['campaign_id'], threshold=THRESHOLD_POPULAR_CAMPAING),
        model=lambda x: get_most_popular(x['model'], threshold=THRESHOLD_POPULAR_MODEL),
        manufacturer=lambda x: get_most_popular(x['manufacturer'], top_values=TOP_MANUFACTURER),
        city=lambda x: get_most_popular(x['city'], top_values=TOP_CITIES),
        is_optin=lambda x: x['is_optin'].replace({1:"optin", 0: "not_optin"}),
        installations_perday=lambda x: x['install_date'].dt.strftime('%Y-%m-%d').to_frame().merge(installations_perday, how='left', left_on='install_date', right_index=True)[0],
        installation_day=lambda x: x['install_date'].dt.day_name(),
        installation_month=lambda x: x['install_date'].dt.month,
        is_apple = lambda x: x['manufacturer'].str.contains('apple', flags=re.IGNORECASE, na=False).replace({True:"is_apple", False: "not_apple"}),
        mobile_classification = lambda x: x['mobile_classification'].replace(r'^\s*$', 'unkown_mobile_classification', regex=True),
    )
    .assign(**{col: lambda x, col=col: x[col].astype('category') for col in user_cols})
    .get(engagement_cols + user_cols + [target_col, 'dataset'])
)

To ensure that each class is treated equally and to accommodate the requirements of models that cannot handle categorical variables directly, certain columns were transformed using one-hot encoding. This technique converts categorical values into numerical representations, enabling the model to process them effectively.

one_hot_columns = ['is_optin', 'platform', 'app_id', 'country', 'ad_network_id', 'campaign_type', 'mobile_classification']
X = pd.get_dummies(data = X, prefix = 'OHE', prefix_sep='_',
               columns = one_hot_columns,
               drop_first =False,
              dtype='int8')

While the numerical values were left intact, normalization might be considered to improve model performance. Correlated features were removed based on exploratory data analysis. To ensure realistic predictions on day zero, information from features like (d3, d7, d14, …) was excluded from the training dataset. This prevents the model from over-relying on these features and improves its ability to make accurate predictions in cases where such information is unavailable.

Train test split

The dataset was randomly divided into training, testing, and validation sets, with proportions of 60%, 20%, and 20%, respectively. The training and validation sets were used during the model training phase to optimize hyperparameters and evaluate performance. The testing set was reserved for final model evaluation and comparison with other models.

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=1)
X_train, X_val, y_train, y_val = train_test_split(X_train, y_train, test_size=0.25, random_state=1)

Model selection

Gradient boosting models have demonstrated exceptional performance in similar Kaggle competitions, making them well-suited for this task. I selected three popular gradient boosting models: XGBoost, CatBoost, and LightGBM. To predict the revenue value, I employed the regression version of these models. The RMSE metric was chosen to monitor the models’ performance throughout the training process.

The following figure illustrates the evolution of the loss function for both the training and validation datasets. Training was halted when the validation loss stopped decreasing to prevent overfitting.

In order to find the best parameters of the model I used the open-source python library hyperopt. Hyperopt is used for hyperparameter optimization of machine learning models. It allows you to define the search space for your parameters and the optimization strategy. Hyperopt then tries a certain number of configurations to find the best possible set of parameters for your model.

In the context of gradient boosting models, some of the parameters you might want to tune using Hyperopt include:

The number of leaves and the maximum depth of the decision trees.
The min_child_weight parameter, which stops the algorithm from splitting a node further if the number of samples in that node falls below a certain threshold.
The learning rate and the number of estimators used by the model.

space={
    'enable_categorical': True,
    'learning_rate': hp.quniform("learning_rate", 0.05, 0.1, 0.05),
    'max_depth': hp.quniform("max_depth", 3, 18, 1),
    'max_leaves': hp.quniform ('max_leaves', 1,9, 1),
    'min_child_weight' : hp.quniform('min_child_weight', 0, 10, 1),
    'n_estimators': 2000,
    'early_stopping_rounds': 5,
    'device': "cuda:0",
    'seed': 0
    }


def objective(space):
    clf=XGBRegressor(
         enable_categorical= True,
        learning_rate=space["learning_rate"],
        max_depth=int(space["max_depth"]),
        max_leaves=int(space['max_leaves']),
        min_child_weight=int(space['min_child_weight']),
        n_estimators=int(space['n_estimators']),
        early_stopping_rounds=int(space['early_stopping_rounds']),
        device= "cuda:0",
        seed=0
    )
        
    evaluation = [( X_train, y_train), ( X_test, y_test)]

    clf.fit(X_train, y_train, 
            eval_set=[(X_train, y_train), (X_val, y_val)],
            verbose=False)
    
    pred = clf.predict(X_test)
    mae = mean_absolute_error(pred, y_test)
    rmse = root_mean_squared_error(pred, y_test)
    print(f"mae: {mae}, rmse: {rmse}")
    return {'loss': rmse, 'status': STATUS_OK }
    
trials = Trials()
best_hyperparams = fmin(fn = objective,
                        space = space,
                        algo = tpe.suggest,
                        max_evals = 100,
                        trials = trials)

The predictions from the three models have been combined using a weighted average.

Results

One of the advantages of gradient boosting models is their interpretability. These models can identify the most influential variables in making predictions. The following figure presents the feature importance for the XGBoost model. The installations per day feature is the most significant, providing valuable context about the day the user installs the application. Another important variable is the city where the user downloaded the app, as it implies information about the user’s demographics. Engagement variables, such as total revenue from ads, are also crucial for the final decision.

The RMSE of the model on the test dataset is 0.748. Since the target variable was transformed using a logarithmic function, this RMSE corresponds to a predicted error of approximately $1.11, which represents the most common error for each use case prediction. The following figure illustrates the distribution of the actual target values in the test dataset and the predicted values generated by the model. The model tends to predict values between 0.1 and 1. It struggles to accurately predict the behavior of the target variable in the range between 0.001 and 0.01.

Conclusion

This study aimed to develop a model to predict user-generated revenue within a digital application. By leveraging user installation features and engagement metrics, the model can provide valuable insights for optimizing monetization strategies.

The analysis revealed several key factors influencing revenue generation. Features like installation day, user city, and total ad revenue were identified as the most impactful on the model’s predictions.

The XGBoost model achieved an RMSE of 0.748 on the test dataset, which translates to a predicted error of approximately $1.11 for most user cases. However, the model exhibits limitations in accurately predicting low-revenue users (target values between 0.001 and 0.01).

Overall, this study demonstrates the potential of gradient boosting models for user revenue prediction within digital applications. By incorporating additional features and refining the model further, one can potentially improve prediction accuracy and gain deeper insights into user behavior that can be harnessed for effective revenue generation strategies.

Personalized Plan Care Information

noemail@noemail.org (Cristian Brokate) — Sun, 28 Jul 2024 00:00:00 GMT

Using RAG and LLM to provide accurate information about plant care.

Music playlists dashboard

noemail@noemail.org (Cristian Brokate) — Fri, 23 Feb 2024 00:00:00 GMT

Music has a significant impact on the world in various ways. Gaining insight into the patterns of popular music can be a fascinating endeavor. In this post, we will demonstrate how to utilize Spotify’s trending music to stay up-to-date with current trends in a self-hosted manner.

Stable diffusion: realistic photo of yoda as a DJ from behind listening music in front of a playlist dashboard in a big screen

Data

Spotify is a widely used music service, and its playlist data is publicly available on the internet. There are several popular trending playlists that reflect current music preferences. By utilizing GitHub Actions, you can automatically fetch this data at regular intervals and store it without needing to manage a database. This data can then be easily accessed using straightforward HTTP requests directly to GitHub. I utilize the project called spotify-downloader to download the playlist data and save it as a file. Here is the follwing code snippet to do it:

def download(key, url):
    cmd=f"docker run --rm -v {CWD}/tmpplaylists:/music spotdl/spotify-downloader save {url.strip()} --save-file {key}.spotdl"
    p=subprocess.Popen(cmd.split(" "),
                             stderr=subprocess.STDOUT,
                             stdout=subprocess.PIPE)
    for line in iter(p.stdout.readline, b''):
        print(f">>> {line.rstrip().decode('utf-8')}")

Then I parse every playlists file and do simple preprocessing on the artists column in order to obtain a list of every artists that participates in the songs.

def read_data():
    appended_data = []
    cols = ['name', 'artists', 'album_name', 'date', 'song_id', 'cover_url', 'playlist', 'position']
    for f in glob.glob('tmpplaylists/*.spotdl'):
        data = pd.read_json(f).assign(
                artists=lambda x: x['artists'].explode().str.replace("'","").str.replace("\"", "").reset_index().groupby('index').agg({'artists': lambda y: y.tolist()}),
                playlist=f.split("/")[1].split(".")[0],
                position=lambda x: x.index + 1
                )
        assert len(set(cols).difference(data.columns)) == 0, f'Columns: {", ".join(data.columns)}'
        assert len(data) > 0, f"Shape {data.shape[0]} and {data.shape[1]} columns"
        appended_data.append(data)
    (
            pd.concat(appended_data, ignore_index=True)
            .get(cols)
            .to_csv('static/data/data.csv', index=False, header=True, sep=";")
    )

By employing periodic GitHub Actions, it is possible to regularly save playlist positions every week, enabling further processing of this data through other tools.

Observable dashboard

I utilize the Observable framework, which incorporates the D3 JavaScript library for generating swift and adaptable visualizations.

Observable Notebook combines the features of conventional text editors, code editors, and document processors into a unified interface, simplifying the creation of rich and dynamic documents that integrate text, code, data visualization, and other multimedia elements.

Observable employs the concept of “cells” to arrange content within a notebook, where each cell can either contain plain text or executable code written in JavaScript or any other supported language. Cells can be rearranged, grouped, and nested, enabling the creation of hierarchical structures that reflect the logical organization of the document.

One can write a markdown notebook and import data from multiple languages, for example I use a python preprocessing pipeline, then I import the data in the notebook and plot it using the available visualizations functions.

# Playlist details

const commit_date_old = Array.from(new Set(diffData.map(i => i.commit_date)))[1];
const commit_date_recent = Array.from(new Set(diffData.map(i => i.commit_date)))[0];

From ${commit_date_old} to ${commit_date_recent} new songs have been added to the playlist.

const playlistsNames = bestArtists.map(i => i.playlist)
const playlistChoosen = view(Inputs.select(new Set(playlistsNames), {value: playlistsNames[0], label: "Playlists"}));
const artistsNames = bestArtists.map(i => i.artists)

const tableRows = RecentSongAdds(diffData, playlistChoosen, commit_date_old, commit_date_recent)

 class="card" style="margin: 1rem 0 2rem 0; padding: 0;">
  ${Inputs.table(tableRows, {
  columns: ["position", "artists", "name", "album_name", "attribute"],
  align: {"position": "left"},
  format: {
    attribute: (x) => x == "+" ? "New!" : x == "-" ? "🗑" : x > 0 ? `⬆${x}` : x == 0 ? '--' : `⬇${Math.abs(x)}`
  }
})}


 class="grid grid-cols-1" style="grid-auto-rows: 560px;">
   class="card">
    ${BestArtistsPlot(bestArtists, playlistChoosen)}
  




const mostPopularArtists = view(Inputs.select(mostFrequent(bestArtists.filter(i => i.playlist == playlistChoosen).map(i => i.artists)).slice(0,10), {value: artistsNames[0], label: "Popular artists"}));

 class="grid grid-cols-1" style="grid-auto-rows: 560px;">
   class="card">
    ${BestSongsPlot(bestArtists, playlistChoosen, mostPopularArtists)}

The dashboard is hosted on GitHub pages, the link is available at cristianpb.github.io/playlists.

Analysis

The dashboard allows for the identification of patterns in the development of Spotify playlists over time. The Today Top Hits playlist reflects global music trends, having garnered more than 34 million likes at the time of writing this article.

We can observe artists such as Olivia Rodrigo, who has multiple tracks featured in the “Today’s Top Hits” playlist. Some songs exhibit a consistent pattern, indicating that they have maintained popularity and catchiness over time, for example, “The Vampire Song,” which remained among the top 35 songs for more than four months. Conversely, other tracks like “Catch Me Now” may initially appear in the playlist due to the artist’s popularity but subsequently decline in ranking during subsequent weeks.

One might also observe that artist-specific radio playlists, which are frequently updated, exhibit minimal fluctuations. For instance, “Muse Radio,” “Coldplay Radio,” and “The Strokes” playlists undergo infrequent changes.

Discusion

Observable is a practical platform for crafting data analyses, offering versatile connectors and support for multiple programming languages. The variety of available visualizations is crucial, and comprehensive documentation plays a significant role in guiding users to create effective visualizations.

However, incorporating reactive filters or reusing variables within an Observable notebook necessitates writing JavaScript code, which may be a drawback for some users. Although the reactivity of Observable notebooks is functional, it might not be the most advanced option available.

The code to process the data and build the dashboard is available at github.com/cristianpb/playlists.

Log analysis using Fluentbit Elasticsearch Kibana

noemail@noemail.org (Cristian Brokate) — Mon, 11 Sep 2023 00:00:00 GMT

Logs are a valuable source of information about the health and performance of an application or system. By analyzing logs, you can identify problems early on and take corrective action before they cause outages or other disruptions.

One way to analyze logs is to use a tool like Fluent Bit to collect them from different sources and send them to a central repository like Elasticsearch. Elasticsearch is a distributed search and analytics engine that can store and search large amounts of data quickly and efficiently.

Once the logs are stored in Elasticsearch, you can use Kibana to visualize and analyze them. Kibana provides a variety of tools for exploring and understanding log data, including charts, tables, and dashboards.

By analyzing logs using Fluent Bit, Elasticsearch, and Kibana, you can gain valuable insights into the health and performance of your applications and systems. This information can help you to identify and troubleshoot problems, improve performance, and ensure the availability of your applications.

Log injection architecture

Fluent-bit: log injection

Traefik, a modern reverse proxy and load balancer, generates access logs for every HTTP request. These logs can be stored as plain text files and compressed using the logrotation Unix utility. Fluent Bit, a lightweight log collector, provides a simple way to insert logs into Elasticsearch. In fact, it provides several input connectors for other sources, such as syslog logs, and output connectors, such as Datadog or New Relic.

To send Traefik access logs to Elasticsearch using Fluent Bit, you will need to:

Install Fluent Bit on the machine where Traefik is running.
Configure Fluent Bit to collect the Traefik access logs.
Configure Elasticsearch to receive the logs from Fluent Bit

The configuration file from fluent bit has the following sections:

Input: I use the tail connector to fetch data from access.log file
Filter: I use MaxMind plugin to geocode IP adresses.
Output: Points directly to elasticsearch database.

The following is configuration file shows how to collect Traefik logs and send them to Elasticsearch:

# fluentbit.conf
[SERVICE]
    flush             5
    daemon            off
    http_server       off
    log_level         info
    parsers_file      parsers.conf

[INPUT]
    name              tail
    path              /var/log/traefik/access.log,/var/log/traefik/access.log.1
    Parser            traefik
    Skip_Long_Lines   On

[FILTER]
    Name                  geoip2
    Match                 *
    Database              /fluent-bit/etc/GeoLite2-City.mmdb
    Lookup_key            host
    Record country host   %{country.names.en}
    Record isocode host   %{country.iso_code}
    Record latitude host  %{location.latitude}
    Record longitude host %{location.longitude}
    
[FILTER]
    Name                lua
    Match               *
    Script              /fluent-bit/etc/geopoint.lua
    call                geohash_gen

[OUTPUT]
    Name                es
    Match               *
    Host                esurl.com
    Port                443
    HTTP_User           username
    HTTP_Passwd         password
    tls                 On
    tls.verify          On
    Logstash_Format     On
    Replace_Dots        On
    Retry_Limit         False
    Suppress_Type_Name  On
    Logstash_DateFormat all
    Generate_ID         On

I use an additional filter function to produce a geohash record, which then it’ll be used in kibana in geo maps plot.

# geopoint.lua
function geohash_gen(tag, timestamp, record)
        new_record = record
        lat = record["latitude"]
        lon = record["longitude"]
        hash = lat .. "," .. lon
        new_record["geohash"] = hash
        return 1, timestamp, new_record
end

The parser uses a regex expression to obtain the different fields for each record. By default all fields are process as strings, but you can other types, like integer for fields like request size, request duration and number of requests.

# parsers.conf
[PARSER]
    Name   traefik
    Format regex
    Regex  ^(?[\S]*) [^ ]* (?[^ ]*) \[(?[^\]]*)\] "(?\S+)(?: +(?[^\"]*?)(?\S*)?)?" (?[^ ]*) (?[^ ]*)(?: "(?[^\"]*)" "(?[^\"]*)")? (?[^ ]*) "(?[^\"]*)" "(?[^\"]*)" (?[\d]*)ms$
    Time_Key time
    Time_Format %d/%b/%Y:%H:%M:%S %z
    Types request_duration:integer size:integer number_requests:integer



Once you have configured Fluent Bit, you can start it by running the following command: fluent-bit -c fluent-bit.conf or by using docker compose:

# docker-compose.yml
version: "3.7"

services:
  fluent-bit:
    container_name: fluent-bit
    restart: unless-stopped
    image: fluent/fluent-bit
    volumes:
      - ./parsers.conf:/fluent-bit/etc/parsers.conf
      - ./fluentbit.conf:/fluent-bit/etc/fluent-bit.conf
      - ./geopoint.lua:/fluent-bit/etc/geopoint.lua
      - ./GeoLite2-City.mmdb:/fluent-bit/etc/GeoLite2-City.mmdb
      - /var/log/traefik:/var/log/traefik


Elasticsearch: Log indexing

Elasticsearch is a popular open-source search and analytics engine that can be
used for a variety of tasks, including log analysis. It is a good choice for
log analysis because it can be queried using complex queries, and it provides a
REST API to cast queries directly in readable JSON format.

Elasticsearch uses a distributed architecture, which means that it can be
scaled to handle large amounts of data. It also supports a variety of data
types, including text, numbers, and dates, which makes it a versatile tool for
log analysis.

To use Elasticsearch for log analysis, you would first need to index the logs
into Elasticsearch. This can be done using a variety of tools, such as Logstash
or Fluent Bit. Once the logs are indexed, you can then query them using
Elasticsearch’s powerful query language.

Elasticsearch’s query language is based on JSON, which makes it easy to read
and write. It also supports a variety of features, such as full-text search,
regular expressions, and aggregations.

Mappings

Elasticsearch creates a mapping for new indices by default, guessing the type
of each field. However, it is better to provide an explicit mapping to the
index. This will allow you to control the type of each field and the operations
that can be performed on it. For example, you can specify that a field is of
type ip so that it can be used to filter for IP address groups, or you can
specify that a field is of type geo_point so that it can be used to filter by
an specific location.

curl -XPUT "https://hostname/logstash-all" -H 'Content-Type: application/json' -d '{ "mappings": { "properties": { "@timestamp": { "type": "date" }, "agent": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 } } }, "code": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 } } }, "country": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 } } }, "geohash": { "type": "geo_point" }, "host": { "type": "ip" }, "isocode": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 } } }, "latitude": { "type": "float" }, "longitude": { "type": "float" }, "method": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 } } }, "number_requests": { "type": "long" }, "path": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 } } }, "protocol": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 } } }, "referer": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 } } }, "request_duration": { "type": "long" }, "router_name": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 } } }, "router_url": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 } } }, "size": { "type": "long" }, "user": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 } } } } } }'


Managed database

Bonsai.io is a managed Elasticsearch service that
provides high availability and scalability without the need to manage or deploy
the underlying infrastructure. Bonsai offers a variety of plans to suit
different project requirements.




Bonsai.io dashboard


The hobbyist tier is more than enough for this kind of use case, which comes
with a maximum of 35k documents, 125mb of data and 10 shards. At the moment of
writing this article its a free, you don’t have to enter a credit card to use
it.

In order to be compliant with the limits of the hobby tier, I use the following
cronjob to remove old documents:

curl -X POST "https://hostname/logstash-all/_delete_by_query" -H 'Content-Type: application/json' -d '{ "query": { "bool": { "filter": [ { "range": { "@timestamp": { "lt": "now-10d" } } } ] } } }'


For my use case, 10 days retention is enough to be compliant with the plan
limits.

Kibana: Log analysis

Bonsai.io also provides a managed kibana service connected to the elasticsearch cluster.

There are certain limitations about the stack management, like there is no possibility to manage the index life cycle or alerting capacity.

Nevertheless, it provides basic functionality to create useful dashboards and
discover patterns inside the logs.




Kibana dashboard


Its interesting to see bot request trying to explode vulnerability from
services like wordpress and also bot scrapping services.

Discusion

The following stack provides a simple and cost-effective way to analyze logs.
The computational footprint on your server is very low because most of the
infrastructure is in the cloud. There are many freemium services, such as
Bonsai.io and New Relic, that can be used to ingest and analyze logs.

Observability is important for infrastructure management, but it is also
important to have alerting capabilities to detect and respond to threats.
Unfortunately, these plugins are not typically included in the free plan, so
you will need to upgrade to a paid plan to get them.



Magic wand gesture recognition using Tensorflow and SensiML
noemail@noemail.org (Cristian Brokate) — Fri, 4 Jun 2021 00:00:00 GMT
During the last decade, IoT devices have become very popular. 
Their small factor size made they optimal for all kind of applications. 
Their technology has also improve in the last decade and now a days they are
are able to do machine learning in the edge.

I recently received a QuickFeather microcontroller from a
Hackster.IO contest. One of
the main features of this device is its built-in eFPGA, which can optimize
parallel computations on the edge.

This post will explore the capabilities of this little beast and show how to
run a machine learning model that was trained using Tensorflow.
The use case will be focused for gesture recognition, so the device will be
able to detect if the movement correspond to one alphabet letter.

QuickFeather

The QuickFeather is a very powerful device with a small form factor (58mm
x 22mm). It’s the first FPGA-enabled microcontroller to be fully supported with
Zephyr RTOS. Additionally it includes a MC3635 accelerometer, a pressure, a
microphone and an integrated Li-Po battery charger.

Unlike other development kits which are based on proprietary hardware and
software tools, QuickFeather is based on open source hardware and is built
around 100% open source software. QuickLogic provides a nice
SDK to flash some FreeRTOS
software and get started. There is a bunch of documentation and examples in
their github repository.

Since the QuickFeather is optimized for battery saving use cases, it doesn’t
include neither Wi-Fi nor Bluetooth connectivity. Therefore, the data can be
only transferred using UART serial connection.

Capture data

The on-board accelerometer is the main sensor for this use case. I use a
USB-serial converter in order to read data directly from the accelerometer and
transfer it to another host that is connected to the other end of the usb
cable.

Data is captured and analysed using another machine. I personally connected a
raspberry pi, which has also a small form factor, in order to have flexibility
when performing the different gestures.

SensiML provides a web application to visualize and save data.
This application is a python application that runs a flask webserver and
provides nice functionalities such as capturing video at the save time in order
to correlate to saved data.
The code is available on github, so one can see how the code works and even
propose some modifications, like I did.

I captured data from O, W and Z gestures as you can see in the following picture:




Data capture using open-gateway application


Label data with Label Studio

Once data is collected one need to label it so that one can teach a machine
learning model how to associate a certain movement with a gesture. 
I used Label Studio, which is a open source data
labelling tool. It can be used to label different kind of data such as image,
audio, text, time series and a combination of all the precedent.

It can be deployed on-premise using a docker image, which is very handy if you
want to go fast.

Setup

Once Label Studio stars, it has to be configured for a label task. For this case, the
label task corresponds to a time series data. One can chose a graphical
configuration using preconfigured templates or you can customized your self
with some kind of html code. Here is the code I use to configure the data
coming from X, Y and Z accelerometers.


  
   name="label" toName="ts">
     value="O" background="red"/>
     value="Z" background="green"/>
     value="W" background="blue"/>
  
  
   name="ts" valueType="url" value="$timeseriesUrl" sep="," >
     column="AccelerometerX" strokeColor="#1f77b4" legend="AccelerometerX"/>
     column="AccelerometerY" strokeColor="#ff7f0e" legend="AccelerometerY"/>
     column="AccelerometerZ"  strokeColor="#111111" legend="AccelerometerZ"/>
  



Label Studio has a nice preview feature, which shows how the labelling task will look with
the supplied configuration. The following screenshot shows how the interface
looks like for the setup process.




Label Studio setup configuration


Labelling

One of the nicest things from Label Studio is the fact that one can go really
fast using the keyboard shortcuts. It also provides some machine learning
plugins which make predictions with the partial labelled data.
The following screenshot shows the interface for some labelled data.




Data labelled using Label Studio


From a machine learning perspective, the exported data should be a csv file with four different columns. Even is Label Studio is able to export in csv, it didn’t have the right format for me, instead it looks like the following:

timeseriesUrl,id,label,annotator,annotation_id
/data/upload/W.csv,3,"[{""start"": 156, ""end"": 422, ""instant"": false, ""timeserieslabels"": [""W""]}, ... ]",admin@admin.com,3
/data/upload/Z.csv,2,"[{""start"": 141, ""end"": 419, ""instant"": false, ""timeserieslabels"": [""Z""]}, ...]",admin@admin.com,2
/data/upload/O.csv,1,"[{""start"": 77, ""end"": 389, ""instant"": false, ""timeserieslabels"": [""O""]}, ...]",admin@admin.com,1


So I decided to export labels in json format and then build a python script to
transform and combine them all.  The following script transforms three json files
from Label Studio into a single file with 4 columns AccelerometerX,
AccelerometerY, AccelerometerZ and Label.

import numpy as np
import pandas as pd

df_all = pd.DataFrame()
LABELS = ['W', 'Z', 'O']
sensor_columns = ['AccelerometerX','AccelerometerY', 'AccelerometerZ', 'Label']

for ind, label in enumerate(LABELS):
    df = pd.read_csv(f'{label}/{label}.csv')
    events = pd.DataFrame(pd.read_json('WOZ.json')['label'][ind])

    df['Label'] = 0
    for k,v in events.iterrows():
        for i in range(v['start'], v['end']):
            df['Label'].loc[i] = v['timeserieslabels'][0]
    df['LabelNumerical'] = pd.Categorical(df.Label)

    df[sensor_columns].to_csv(f'{label}/{label}_label.csv', index=False)
    df_all = pd.concat([df_all, df], sort=False)

df_all[sensor_columns].to_csv(f'WOZ_label.csv', index=False)


The resulting data can be directly used as a time series data and a machine
learning model can be trained in order to recognise the patterns automatically.
The following picture shows data for W, O and Z patterns.




Labelled data for W, O and Z gestures


Training a model using SensiML

SensiML provides a python package to build a data pipeline which can be used to
train a machine learning model. One need to create a free account in order to
use it. There is a lot of documentation and examples
available online.

Pipeline

Pipelines are a key component of the SensiML workflow. Pipelines store the
preprocessing, feature extraction, and model building steps.

Model training can be done using either SensiML cloud or using Tensorflow to
train the model locally and the uploading it to SensiML in order to obtain the
firmware code to run on the embedded device.

In order to train the model locally, one needs to build a data pipeline to process data and calculate the feature vector.
This is done using the following pipeline:

  The Input Query function which specifies what data is being fed into the model
  The Segmentation which specifies how the data should be feed to the classifier.
  Windowing segmented which captures data depending on gesture expected length.
  The Feature Generator which specify which features should be extracted from the raw time-series data
  The Feature Selector which selects the best features. In this case, we are using the custom feature selector to downsample the data.
  The Feature Transform which specifies how to transform the features after extraction. In this case, it is to scale them to 1 byte each


Here is the python code for the pipeline

dsk.pipeline.reset()
dsk.pipeline.set_input_data('wand_10_movements.csv', group_columns=['Label'], label_column='Label', data_columns=sensor_columns)

dsk.pipeline.add_segmenter("Windowing", params={"window_size": 350, "delta": 25, "train_delta": 25, "return_segment_index": False})

dsk.pipeline.add_feature_generator(
    [
        {'subtype_call': 'Statistical'},
        {'subtype_call': 'Shape'},
        {'subtype_call': 'Column Fusion'},
        {'subtype_call': 'Area'},
        {'subtype_call': 'Rate of Change'},
    ],
    function_defaults={'columns': sensor_columns},
)

dsk.pipeline.add_feature_selector([{'name':'Tree-based Selection', 'params':{"number_of_features":12}},])

dsk.pipeline.add_transform("Min Max Scale") # Scale the features to 1-byte


TensorFlow model

I use the TensorFlow Keras API to create a neural network. This model is very simplified because not all Tensorflow functions and layers are available in the microcontroller version.  I use a fully connected network to efficiently classify the gestures. It takes in input the features vectors created previously with the pipeline (12).

from tensorflow.keras import layers
import tensorflow as tf

tf_model = tf.keras.Sequential()

tf_model.add(layers.Dense(12, activation='relu',kernel_regularizer='l1', input_shape=(x_train.shape[1],)))
tf_model.add(layers.Dropout(0.1))
tf_model.add(layers.Dense(8, activation='relu', input_shape=(x_train.shape[1],)))
tf_model.add(layers.Dropout(0.1))
tf_model.add(layers.Dense(y_train.shape[1], activation='softmax'))

# Compile the model using a standard optimizer and loss function for regression
tf_model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])


Training

The training is performed by feeding the neural network with the dataset by
batches of data. For each batch of data a loss function is computed and the
weights of the network are adjusted.  Each time it loops through the entire training set, then is
called an epoch. In the following picture:

  at the top left we can see the evolution of the loss function, it decreased, meaning that it converges to a optimal solution.
  at the bottom left we can see the evolution of the accuracy of the model, it increases!
  at the right we have the confusion matrix for the validation and train set.





Model training performance


The confusion matrix provides information not only about the accuracy but also
about the kind of errors of the model. It’s often the best way to understand
which classes are difficult to distinguish.

Once you are satisfied with the model results, it can be optimized using
Tensorflow quantize function. The quantization reduces the model size by
converting the network weights from 4-byte floating point values to 1-byte
unsigned int8. Tensorflow provides the following built-in tool:

# Quantized Model
converter = tf.lite.TFLiteConverter.from_keras_model(tf_model)
converter.optimizations = [tf.lite.Optimize.OPTIMIZE_FOR_SIZE]
converter.representative_dataset = representative_dataset_generator
tflite_model_quant = converter.convert()


There are more benefits by quantizing the model for Cortex-M processors like
the Quickfeather, which uses some instructions that gives a boost in performance.

The quantized model can be uploaded to SensiML in order to obtain a firmware to
flash to the QuickFeather.  One can download the model using the jupyter
notebook widget or in sensiml cloud application.
There are two available formats:

  binary: this can be flashed directly to the QuickFeather. The results
are transferred using serial output.
  library: this is a knowledgepack form, which can be used in Qorc SDK
to compile. There is more flexibility for this option, because one can
modify source code before compiling.


Export model to Quickfeather

The knowledgepack can be customized in order to light the QuickFeather led with
a different colour depending on the prediction made.
This can be done by adding the following function to the src/sml_output.c file.

// src/sml_output.c
static intptr_t last_output;

uint32_t sml_output_results(uint16_t model, uint16_t classification)
{

    //kb_get_feature_vector(model, recent_fv_result.feature_vector, &recent_fv_result.fv_len);

    /* LIMIT output to 100hz */

    if( last_output == 0 ){
        last_output = ql_lw_timer_start();
    }

    if( ql_lw_timer_is_expired( last_output, 10 ) ){
        last_output = ql_lw_timer_start();

        if ((int)classification == 1) {
            HAL_GPIO_Write(4, 1);
        } else {
            HAL_GPIO_Write(4, 0);
        }

        if ((int)classification == 2) {
            HAL_GPIO_Write(5, 1);
        } else {
            HAL_GPIO_Write(5, 0);
        }

        if ((int)classification == 3) {
            HAL_GPIO_Write(6, 1);
        } else {
            HAL_GPIO_Write(6, 0);
        }
    	sml_output_serial(model, classification);
    }
    return 0;
}


Finally the model can be compiled using Qorc SDK and flashed again to the QuickFeather.

Test model using Quickfeather

One can use a Li-Po battery with the battery connector of the QuickFeather in order to have complete autonomy.
Then using a nice spoon like the following one can improvise a magic wand 🪄:




QuickFeather as a magic wand


The following video shows the recognition system in action, the colours mean he following:

  red for O gesture
  green for W gesture
  blue for Z gesture




  
    Your browser doesn't support HTML5 video.
  



Conclusions

QuickFeather is a device completely adapted for tiny machine learning models.
This use case provides a simple example to demystify the whole workflow for
implementing machine learning algorithms to microcontrollers, but it can be
extended for more complex use cases, like the one provided in the Hackster.io
Climate Change
Challenge.

SensiML provides provides nice tools to simplify machine learning
implementation for microcontrollers. They provide software like Data Capture
Lab, which capture data and
also provides a labelling module. However, for this case I prefer to use Label
Studio, which is more generic tool, that works for most use cases.

The notebook with the complete details about the model training can be found in this gist.


TheFifthDriver: Machine learning driving assistance on FPGA
noemail@noemail.org (Cristian Brokate) — Mon, 7 Dec 2020 00:00:00 GMT
FPGA implementation of a highly efficient real-time machine learning driving assistance application using a camera circuit.


Hydroponic Agriculture Learning with SensiML AI Framework
noemail@noemail.org (Cristian Brokate) — Mon, 7 Dec 2020 00:00:00 GMT
New methodologies of horticulture based-on high-end technology are urgently required to transform the way in which the world is fed. In this project, we present the results of a hydroponic agriculture PoC, which was developed using Quicklogic's QuickFeather in conjuntion with SensiML to highlight the enormous benefits that the growth of crops without soil brings to the climate change.


Writing notes with Vimwiki and Hugo static generator
noemail@noemail.org (Cristian Brokate) — Wed, 1 Jul 2020 00:00:00 GMT
Taking notes is important when you want to remember things. I used to have
notebooks, which worked fine, but it’s complicated to search things when you
have a lot of things. As a die hard Vim user, I decided to give a try to
Vimwiki to help me organize my notes and ideas.

Vimwiki makes easy for you to create a personal wiki using the Vim text editor.
A wiki is a collection of text documents linked together and formatted with
plain text syntax that can be highlighted for readability using Vim’s
syntax highlighting feature.

The plain text notes can be exported to HTML, which improves readability. In
addition, it’s possible to connect external HTML converters like Jekyll or
Hugo.

In this post I will show the main functionalities of Vimwiki and how to
connect the Hugo fast markdown static generator.


  
    
    
    
      Vimwiki notes writing in Vim
    
  
  
    
    
      Markdown notes converted into HTML
    
  


Vimwiki

With Vimwiki you can:

  organize your notes and ideas in files that are linked together;
  manage todo lists;
  maintain a diary, writing notes for every day;
  write documentation in simple markdown files;
  export your documents to HTML.


Vim shortcuts

One of the main Vim advantages is the fact that it’s a modal editor,
which means that it has different edition modes. 
Each edition mode gives different functionalities to each key.
This increases the number of shortcuts without having to include multiple keyboard combinations. In Vimwiki this allows to write notes with ease.

When I want to write some notes, I just open Vim and then I use
ww to create a new note for today with a name based on the
current date.  is a key that can be configured in Vim, in my case it’s comma character (,).

If I want to look at my notes I can use ww to open the wiki index
file. I can use Enter key to follow links in the index. Backspace acts a return
to the previous page.

I use CoC snippets to improve autocompletion. In markdown, I find this plugin very useful to create tables, code blocks and links. You can use snippets for almost every programming language, just take a look at the documentation.

When I want to preview the markdown file, I use wh to convert the current
wiki page to HTML and I added also a shortcut to open HTML with the browser.

In the following video you can see an example of this workflow in action.


  
    Your browser doesn't support HTML5 video.
  


Searching in your notes

One of the advantages of digital notes are the fact that you can search quickly in multiple files using queries.

Vimwiki comes with a VimWikiSearch command (VWS) which is only a wrapper
for Unix grep command. This command can search for patterns in case insensitive mode in all your notes.

An excellent way to implement labels and contexts for cross-correlating
information is to assign tags to headlines. If you add tags to your Vimwiki
notes, you can also use a VimwikiSearchTags command.

In both cases, when you are searching in your notes, the results will populate
your local list, where you can move using Vim commands lopen to open the list, lnext to go to the next occurence and lnext for the previous occurence.

Hugo

Vimwiki has a custom filetype called wiki, which is a little bit different
from markdown.  The native vimwiki2html command only works for wiki
filetypes. If you want to transform your files to HTML using other filetypes, like markdown, you have to use a custom parser. Even if I’m not able to use Vimwiki native parser, I prefer markdown format because it’s very popular and simple.

These are some options to use as an external markdown parser:

  Pandoc, which I think works pretty good, but requires a lot of haskell dependencies.
  Python vimwiki markdown libraries, which I think has a lot of potential.
  Static website generators like Jekyll, Hugo or Hexo.


I started using static website generators because it can also publish easily the notes as static webpages, which I wanted to publish in Github Pages.

My first option was Jekyll, which is the Github native supported static website generator. It’s easy to use and the syntax is very straightforward, but I started to regret it when I accumulated a lot of notes. Then I decided to use Hugo, which is claimed to be faster and since it’s been coded in Go, it has no dependencies. In the following table I show my compiling time results for both:


  
    
      Compiling time
      Jekyll
      Hugo
    
  
  
    
      10 pages
      0.1 seconds
      0.1 seconds
    
    
      150 pages
      48 seconds
      0.5 seconds
    
  


I should say that I used Jekyll Github gems, which includes some unnecessary
ruby Gems, so I think Jekyll performance can be increased. It’s a nice software that I use to publish this post, but still Hugo is faster.

Building Vimiki with Hugo

The .vimrc file  contains vim configuration and it’s the place where one can
put the definition about Vimwiki syntax and writing directory.  As
you can see in my configuration, I use markdown syntax and save my files under
~/Documents/vimwiki/.

" ~/.vimrc

let g:vimwiki_list = [{
  \ 'auto_export': 1,
  \ 'automatic_nested_syntaxes':1,
  \ 'path_html': '$HOME/Documents/vimwiki/_site',
  \ 'path': '$HOME/Documents/vimwiki/content',
  \ 'template_path': '$HOME/Documents/vimwiki/templates/',
  \ 'syntax': 'markdown',
  \ 'ext':'.md',
  \ 'template_default':'markdown',
  \ 'custom_wiki2html': '$HOME/.dotfiles/wiki2html.sh',
  \ 'template_ext':'.html'
\}]


The custom wiki2html file correspond to a script which is executed to transform
markdown into HTML. This scripts calls Hugo executable file and tells Hugo to
use the Vimwiki file path as a baseurl in order to maintain link dependencies.

# ~/.dotfiles/wiki2html.sh

env HUGO_baseURL="file:///home/${USER}/Documents/vimwiki/_site/" \
    hugo --themesDir ~/Documents/ -t vimwiki \
    --config ~/Documents/vimwiki/config.toml \
    --contentDir ~/Documents/vimwiki/content \
    -d ~/Documents/vimwiki/_site --quiet > /dev/null


The complete version of my ~/.vimrc can be found in my dotfiles repository.

Deploy Vimwiki to Github Pages

Hugo projects can be easily published to Github using Github Actions. The
following script tells GitHub worker to use Hugo to build html at each push
and publish the HTML files to Github pages.

name: 🚀 Publish Github Pages

on: push
jobs:
  deploy:
    runs-on: ubuntu-latest

    steps:
      - name: Git checkout
        uses: actions/checkout@v2

      - name: Setup hugo
        uses: peaceiris/actions-hugo@v2
        with:
          hugo-version: 'latest'

      - name: Build
        run: hugo --config config-gh.toml

      - name: 🚀 Deploy to GitHub pages
        uses: peaceiris/actions-gh-pages@v3
        with:
          deploy_key: ${{ secrets.ACTIONS_DEPLOY_KEY }}
          publish_branch: gh-pages
          publish_dir: ./public
          force_orphan: true


I like having one part of my notes published on Github Pages, at least the
configuration notes, which can be found in my Github
page. But there is also a part of notes
that I keep private, for example my diary notes, where I may have some sensible
information, so I keep it away from publication just by adding it to the
.gitignore file. Here you can find my Github notes
repository.

Conclusion

I like that fact that Hugo has no dependencies since it’s written in Go, so
it’s very easy to install, just download it from the github project releases
page. In addition is also a blazing fast static website converter, you can find
benchmarks in the internet.

I have been using Vimwiki very often, it allows me to take notes very easily
and also find information about things that happen in the past. When people ask
things about last month meeting I can find what I have written easily by
searching by dates, tags or just words.

Publishing my notes to github allows me to have a have a place where I can keep
track of my vimwiki configuration and also publish simple notes that are not
meant to be a blog post, like my install for arch linux or my text editor
configuration.


Reverse proxy management with Traefik and GoAccess
noemail@noemail.org (Cristian Brokate) — Sun, 7 Jun 2020 00:00:00 GMT
The micro services philosophy consists on dividing applications in simple components.
This approach increases maintainability, since each application is not
coupled with others, so it can be tested, replaced and deployed independently.
This approach has become popular since the adoption of docker container technology.

A micro service project typically includes multiple docker containers, where
each container includes a separated functionality.

These containers can communicate in a private network and map ports with
the external network in order to expose services.

However, not every service includes a security layer, so it’s better to expose
a single application that serves as a router which controls every incoming requests
and send it to the right service.
This avoid exposing a service like a whole database connection.
Some applications that are able to act as a router are: Nginx, Apache server,
Caddy and Traefik.


  Apache server is the oldest one but it has been loosing followers since the
arrival of Nginx.
  Nginx is very popular and powerful web server, which can be adapted to multiple
kind of situations.
  Traefik is the new kid on the block. It has native docker
support, so it means that you don’t have to define custom Nginx routing
configurations, because it can connect directly to docker socket to
automatically detect changes on containers.


In this article I will show how to setup traefik using file system
configuration and also how to implement offline metric analysis using GoAccess
tool.

Traefik

Traefik is a reverse proxy, which routes incoming request to microservices.
It has been conceived for environments with multiple microservices, where a main
configuration is done to set-up Traefik, and then it dynamically detects 
new services comming from docker, kubernetes, rancher or a plain file system.
More information about traefik automatic discovery is available here.

This automatic discovery behaviour was the main thing that attracted me to use
Traefik, unlike Nginx, which refuse to start if a declared service is not available.
Traefik on the other side, it can run even if a declared service won’t run, and
if the docker starts it will be automatically detected by Traefik.

Traefik API

Trafik has a modern web interface to graphically inspect configuration. 
It shows information about:

  the entrypoints, which are the ports that Traefik is listening ;
  the running services and
  the routing rules, which defined how to direct the incoming request.


Traefik interface can be easily enabled in the configuration file. The following
lines tell Traefik to serve the interface in the Traefik entrypoint (8080 by
default). The debug option is useful for profiling performance and debugging Traefik.

api:
  insecure: true
  dashboard: true
  debug: true 


Here is a screen shot of the web interface, where one can see how one service is
configured.




Traefik API web interface. Mopidy service is encrypted using TLS.


In the following gist you can find the complete configuration file for Traefik.
The basic parameters to define are the entrypoints, where Traefik should be
listening and the encryption method. 
The providers configuration can be done in other plain file, or by adding
labels to docker, kubernetes, rancher, etc. In any case it dynamically detects 
changes on providers.




This configuration can be done in plain format if running outside a docker
container, but it can also be done by setting labels to Traefik docker
container.

Traefik security

By default Traefik will watch for all containers running on the Docker daemon,
and attempt to automatically configure routes and services for each container.
If you’d like to have more refined control, you can pass the
--providers.docker.exposedByDefault=false option and selectively enable
routing for your containers by adding a traefik.enable=true label.

Regarding HTTPS security, SSL connections can be easily configured in Traefik,
one can use a self signed certificate or connect automatically to Let’s
Encrypt in order to get an SSL certificate. The
renewal is also taken into account by Traefik. HTTPS redirection is also
available into Traefik parameters.

Traefik as a service

Traefik has been conceived to run as a docker container, but since it’s written
in GO, then it’s possible to run the compiled version as a standalone file in
several operating systems.

In the docker version, Traefik runs automatically when the container is power
on and the logs are scoped to the standard output.
However if you run the standalone file, then you have to configure Traefik as a
system service. I used the excellent information from this Gerald Pape
gist
to configure the Traefik service.

I prefer the standalone version in development environments like the raspberry pi
or jetson nano, where building docker images can be a little long.

GoAccess to monitor logs

GoAccess is a simple tool to analyse logs. It provides fast and valuable HTTP
statistics for system administrators that require a visual server report on the
fly. It can generate reports in terminal format, which is nice if your are
connecting on SSH, but it can also generate CSV, JSON or HTML reports.

Alternatives for this services are Matomo, which has the advantage of being
self hostable and open source.  Then you can be sure about how your colected
data is being used and that is not being sold to 3rd parties and advertisers.
However, Matomo has an extra client side javascript library which is required
in order to parse data, which is another dependency that I don’t want for
internal off-line environments.

Other popular alternative is Google Analytics, which has very powerful
reports and multiple of options that go beyond the scope of this article. The
only problem is that it’s not privacy compliant.

What makes GoAccess interesting, is that it generates detailed analytics based
purely on access logs from a web server, such as Apache, Nginx or in my case
Traefik. It’s written in C, and features both a terminal interface, as well as
a web interface. The way it’s designed to be used is by piping the access.log
contents into the GoAccess binary and providing any number of switches to
customize the output. Switches such as which log format you’re sending it, as
well as how to parse Geolocation from IP addresses.

In the following image you can see an example for GoAccess HTML dashboard. On
the top there is global information about the number of total requests, the
number of unique visitors, the log size, the bandwidth, etc.




GoAccess HTML dashboard


Real-time dashboard

GoAccess can be called using the command line, you can configure log format
using a command line parameter or using a configuration file. Default
configuration file can be found at /etc/goaccess.conf, but you can also pass
other configuration file using --config-file option.

Default output format is in the command line, but one can configure an html
using a specific output file. This option will create a static html report,
which can be continuously updated using the --real-time-html option.

The following code shows the systemctl file that I use to configure GoAccess as
a service for real time use.

[Unit]
Description=Goaccess Web log report.
After=network.target

[Service]
Type=simple
User=root
Group=root
Restart=always
ExecStart=/usr/bin/goaccess -a -g -f /var/log/traefik/access.log -o /var/www/html/report.html --real-time-html
StandardOutput=null
StandardError=null

[Install]
WantedBy=multi-user.target


GoAccess doesn’t include a static web server, so it can not expose the produced
html by himself. But one can easily configure an Nginx static server to
expose the static files, as show in the following Nginx virtual server:

server {
    listen 8082;
    listen [::]:8082;
    server_name  locahost;

    gzip on;
    gzip_types      text/plain application/xml image/jpeg;
    gzip_proxied    no-cache no-store private expired auth;
    gzip_min_length 1000;

    root /var/www/html;

    # Add index.php to the list if you are using PHP
    index index.html index.htm index.nginx-debian.html;

    location / {
            try_files $uri $uri/ =404;
    }
}


Conclusion

Traefik is a static webserver which is well adapted for dynamic configurations.
Even if still a young project and is not as performant as Nginx, it has an
interesting approach and some nice features.
For example in docker applications, it automatically knows the internal IP
address of a service to redirect the incoming request.

GoAccess is a very good tool to provide insights from logs in a close
environment where you can not share your stats with the exterior. Since it has
been written in C, the reading performances are very good, being able to parse
400 millions of hits in 1 hour and 20 minutes, according to GoAccess
FAQ.

Compiling time	Jekyll	Hugo
10 pages	0.1 seconds	0.1 seconds
150 pages	48 seconds	0.5 seconds

Cristian

Artificial Content Generation

Text Generation

Image generation

Writing markdown text

Posting content in social networks

Artifical Traffic Augmentation

Conclusion

Application revenue prediction

Introduction

Data

Individual user features

Install date

Platform

Personalised ads

App Id

Country

Ad Network

Campaign type

Campaign id

Model

Manufacturer

Mobile classification

City

Other variables

User engagement features

Revenues generated by the application

Correlation of values

Evolution over time

Missing values

Modelling

Feature selection

Train test split

Model selection

Results

Conclusion

Personalized Plan Care Information

Music playlists dashboard

Data

Observable dashboard

Analysis

Discusion

Log analysis using Fluentbit Elasticsearch Kibana

Fluent-bit: log injection

Elasticsearch: Log indexing

Mappings

Managed database

Kibana: Log analysis

Discusion

Magic wand gesture recognition using Tensorflow and SensiML

QuickFeather

Capture data

Label data with Label Studio

Setup

Labelling

Training a model using SensiML

Pipeline

TensorFlow model

Training

Export model to Quickfeather

Test model using Quickfeather

Conclusions

TheFifthDriver: Machine learning driving assistance on FPGA

Hydroponic Agriculture Learning with SensiML AI Framework

Writing notes with Vimwiki and Hugo static generator

Vimwiki

Vim shortcuts

Searching in your notes

Hugo

Building Vimiki with Hugo

Deploy Vimwiki to Github Pages

Conclusion

Reverse proxy management with Traefik and GoAccess

Traefik

Traefik API

Traefik security

Traefik as a service

GoAccess to monitor logs

Real-time dashboard

Conclusion