Code Sport I/O

Comparing Embedding Models

Code Sport — Thu, 05 Feb 2026 19:26:43 +0000

Embedding Model Comparison

Feature	bge-m3	qwen3-embedding-0.6b	qwen3-embedding-4b	all-MiniLM-L6-v2	nomic-embed-text-v1.5
Context Window	8,192 tokens	32,768 tokens	32,768 tokens	256 word pieces (~190 tokens)	8,192 tokens
Dimensions	1,024	1,024	2,560	384	768 (Flexible)
Model Size	567M Parameters	600M Parameters	4B Parameters	22.7M Parameters	137M Parameters
MTEB Score / Best For	64.6 / Multilingual & Hybrid	61.8 / Fast Multilingual	~67.5 / High-Precision RAG	56.0 / Speed & Similarity	62.3 / Long-Context English
Primary Use	Multilingual Search	Lightweight local search	Enterprise-grade retrieval	Mobile/Edge tasks	Processing long docs
Size (VRAM)	~1.2 GB	~1.2 GB	~2.5 GB (Quantized)	~45 MB	~274 MB
Accuracy	High (Top Multilingual)	High	Very High (SOTA)	Moderate	High (Beats OpenAI Small)
Popularity	3.2M+	548K+ (Series Total)	548K+ (Series Total)	2.3M+	52.4M+

For Long Documents

The Qwen3 series and nomic-embed-text are the clear winners with context windows of 32k and 8k tokens respectively. Most standard models (like MiniLM) will truncate text after just a few paragraphs.

For Multilingual Tasks

bge-m3 is specifically designed for cross-lingual tasks and supports over 100 languages with high efficiency and hybrid retrieval capabilities.

For Maximum Performance

qwen3-embedding-4b currently leads open-weight benchmarks, offering retrieval accuracy that rivals or exceeds proprietary models like OpenAI’s text-embedding-3-large.

For Speed / Low Resource

all-MiniLM-L6-v2 is incredibly small and fast (~45MB VRAM), making it ideal if you are running on a CPU or a very low-end device.

Enterprise-Ready GenAI: Determinism is the Priority

Code Sport — Thu, 05 Feb 2026 15:31:37 +0000

Enterprise GenAI and the Role of Deterministic Outputs

In many enterprise environments, determinism is not merely a technical preference, it is a fundamental requirement for operational stability.

With the rise of modern Generative AI, deterministic systems are critical in industries where “hallucinations” (making things up) could be dangerous or illegal.

Defining Deterministic Systems

The ability to generate the exact same output, y, for the exact same prompt input x sent to an LLM, f(x), 100% of the time. As such, the goal of Deterministic AI is to produce repeatable, consistent outputs for a given prompt fed into an Large Language Learning Model (LLM)

Use Cases: Generative AI for Expert Systems

Defining Expert Systems

Expert Systems are active reasoning tools which use Retrieval Augmented Generation (RAG) as a supplemental knowledge base. It then adds an inference engine to process those facts and solve a specific problem (e.g., diagnosing a disease or calculating tax returns).

A growing use case for GenAI in the Enterprise is exposing LLMs to their internal documents and knowledge bases. These are repositories (like a wiki or SharePoint or dedicated document server) stores documents, facts, and FAQs. They are largely seen as a passive tool where humans go to search for information.

Use Cases for Deterministic AI

•Finance: For processing transactions or enforcing strict compliance rules.
•Healthcare: For clinical decision support systems that must follow proven medical protocols.
•Industrial Automation: For controlling robotic arms on assembly lines where every move must be exact and repeatable

When deploying Generative AI for Enterprise Knowledge Bases, stakeholders expect “The Canonical Answer” — a single, verified response to a specific query that does not fluctuate based on the model’s random state. Without determinism, two employees asking the same policy question could receive different nuances, leading to internal misalignment.

Auditability: Consistent, Deterministic Outputs Supports Governance Risk, and Compliance

For Corporate Training and Compliance, determinism serves as the backbone of auditability. If an AI tutor or compliance assistant is used to certify staff, the logic used to evaluate their performance must be repeatable. High-variance responses in training scenarios introduce “drift,” where the educational material may slowly deviate from the intended curriculum over thousands of user sessions.

Key Benefits for Organizations:

• Regulatory Compliance: Ensures that automated advice meets legal standards consistently.
• Debugging & QA: Allows engineers to replicate errors by using the same prompt and seed.
• User Trust: Increases reliability perception when the system provides stable, predictable outputs.
• Cost Control: Deterministic paths are easier to cache, reducing API token costs.

Optimal Model Parameters to Enforce Deterministic AI

To ensure the highest degree of determinism, LLM parameters should be locked to a “ground state.” Any value that allows the model to choose from a distribution of tokens rather than the single highest-probability token will introduce variance.

The below parameters move the LLM away from “creative sampling” and toward Greedy Decoding. In Greedy Decoding, the model picks the most likely next token at every step. Randomess is diminished.

Parameter	Optimum Value	Determinism Impact	Explanation
Temperature	0	90%	Disables random sampling; model always picks the most likely token.
Seed	Fixed Integer	5%	Ensures the random number generator starts at the same point. non-deterministic noise in GPU floating-point calculations (cuBLAS/flash-attention)
Top_P	1.0	1%	When Temp is 0, Top_P becomes irrelevant, but keeping it at 1.0 prevents it from interfering with the top choice.
Top_K	0 (or 1)	1%	Setting this to 1 forces the model to only look at the single most likely token.
Frequency_Penalty	0	1%	Ensures token scores and therefore token selection are not modified by historical usage.
Presence_Penalty	0	1%	Prevents token scores from being modified by existence in text.
Repeat_Penalty	1.0	1%	In most systems, 1.0 means “no change.” Anything else modifies the logic scores.
Min_P	0	1%	This is a newer sampling technique; keeping it at 0 ensures it doesn’t prune the top-ranked token. Thus, disabling threshold-based token pruning.

NB: These parameters are not stored in the model file itself. They are injected at runtime when the user sends a prompt.

Critical Nuance: The “Hidden” 100%

Even with the settings above, achieving 100% determinism in LLMs is notoriously difficult due to hardware non-determinism.

Atomic Operations: Modern GPUs perform calculations in parallel. Because of the way floating-point numbers are added across thousands of cores (the order of operations can vary slightly), you might get a tiny difference in the 10th decimal place.
Logit Shifting: That tiny decimal difference can occasionally cause token A (score 10.000001) and token B (score 10.000002) to swap places.
The Fix: This is why the Seed and Temperature = 0 are used together. On some enterprise APIs (like OpenAI), you can also look for the system_fingerprint in the response to ensure the underlying hardware/software version hasn’t changed. Monitor this to ensure the provider didn’t update the model or hardware.

Why Penalties are “0” for Determinism

While you can have a deterministic output with a frequency_penalty of 0.5, it makes the output harder to predict and debug. If your goal is f(x)=y, you want the purest path from the model’s training to the output. Penalties add a layer of “stateful” math (where the next word depends on every word before it in a modified way) which increases the chance of a floating-point error causing a divergence.

Quality Assurance Note

Even with deterministic parameters set to zero, floating-point variance across parallel GPU cores can introduce non-determinism at high token counts. Use the System Fingerprint to verify environment stability during longitudinal testing.

Referecnes

Inference Parameter Optimization for Qwen3-30B-A3B-2507-instruct

Code Sport — Thu, 05 Feb 2026 15:29:28 +0000

LLM Parameter Management & Determinism

Technical Training Manual: Precedence, Overrides, and Enterprise Configuration.

1. Diagnostic Verification & Inspection

Before adjusting settings in the WebUI, engineers must verify the model’s native configuration. Use the following commands to inspect the Modelfile and the parameter defaults assigned to the local instance.

# Inspect parameters for the specific Qwen3 tag
docker exec -it ollama ollama show –parameters qwen3:30b-a3b-instruct-2507-q4_K_M

# View the full Modelfile including System Prompts and Stop Sequences
docker exec -it ollama ollama show –modelfile qwen3:30b-a3b-instruct-2507-q4_K_M

# Verification for Unsloth-specific GGUF exports
docker exec -it ollama ollama show –parameters hf.co/unsloth/Qwen3-30B-A3B-Instruct-2507-GGUF:UD-Q4_K_XL

2. The Three Sources of Parameter Data

2.1 The Ollama Modelfile (Baseline)

The Modelfile represents the foundational “hard-coded” configuration of the model. It defines the system prompt and default sampling parameters. These values serve as the fallback logic for the model; they are used only if the inference request does not specify its own parameters. This is considered the “floor” of the configuration stack.

2.2 Open-WebUI Settings (Master Interface)

Open-WebUI serves as the master orchestration layer. It maintains its own internal state for model parameters which frequently differs from the Ollama defaults. When a user submits a prompt, the WebUI explicitly injects its settings into the API request body. This payload override ensures that the UI settings take precedence over whatever is defined in the Modelfile.

2.3 Qwen3’s Recommendations (Target)

Optimization guidelines from Qwen3 (and reported by Unsloth) provide the mathematical targets for high-performance inference. These recommendations are based on rigorous testing of specific model architectures. Engineers should aim to mirror these targets within Open-WebUI to ensure the model operates within its intended reasoning parameters.

Output Length: Use an output length of 32,768 tokens for most queries, which is adequate for most queries
presence_penalty: 0.0 to 2.0 (llama.cpp default turns it off, but to reduce repetitions, you may use 1.0)

Table 1: Qwen3-30B-AB-2507-Instruct Value Comparison
Parameter	CLI (ollama show)	Open-WebUI Settings	Qwen3 Recommended
temperature	0.8	0.8	0.7
top_p	0.9	0.9	0.8
min_p	0.0	0.05	0.00
top_k	40	40	20
presence_penalty	0.0	0.0	0.0 to 2.0

3. Subtle Influences of Sampling Logic

3.1 Temperature Nuance

Higher = more “creative” (risky) word choices. This is the primary “randomness” dial. At 0.0, the model stops “rolling the dice. At 1 it always rolls!

Technical Deep-dive: Temperature modifies the logits before the final softmax layer. Decreasing it sharpens the probability distribution. At 0.0, the “soft” max becomes a “hard” max, preventing the model from sampling from the “tail” of the distribution.

3.2 Top_P (Nucleus) Nuance

Limits choices to a cumulative probability “mass”. Anything less than 1 increases creativity and randomness. The LLM will exclude the top tokens if less than 1. For enterprise determinism, set it to 1.

Technical Deep-dive: Top_P sampling dynamically adjusts the vocabulary size based on confidence. In confident predictions, it considers very few tokens. Setting this to 1.0 ensures that no candidates are removed based solely on cumulative probability.

3.3 Top_K Nuance

A higher number increases token selection randomness by expanding choices to the top K most likely words.

3.4 Min_P Nuance

Min_P ensures a token is only considered if its probability is at least a specified percentage (e.g., 5%) of the most likely token’s probability. This preserves diverse choices when the model is genuinely uncertain while scaling more naturally than Top_P.

3.4 Frequency Penalty Nuance

High values force the AI to use “new” words to avoid repetition. High values encourage the model to avoid words it just used. This can cause semantic and meaning drift. For enterprise determinism, set to 0.0

Technical Deep-dive: High values prevents repetition by subtracting a value from the logits of tokens that have already appeared. For enterprise determinism, it must be 0.0, as technical accuracy often requires repeating specific terminology that the penalty might otherwise suppress.

3.5 Presence Penalty Nuance

Presence Penalty applies a one-time penalty to tokens that have already appeared in the text at least once, regardless of their frequency. This encourages the model to introduce completely new topics or words into the conversation, which is generally avoided in deterministic retrieval.

Agentic Ai Server Build Tutorial Part I: The MVP

Code Sport — Tue, 03 Feb 2026 17:36:35 +0000

This tutorial assumes you have Ubuntu 24.04 LTS server installed along with an NVIDIA GPU. It serves as Part I of an on-premises Agentic AI server buildout series. Part I, is the minimal viable product. It creates a headless and GPU-accelerated AI server with a client-accesible UI, LLM Runner, LLM, and RAG capabilities via manual document uploads.

Part I: Minimum Viable Product – Proof-of-concept (open-WebUI + Ollama)
Part II: Pilot – Production buildout with 3-concurrent users (N8N + Dify + vLLM + image generation and recognition + Canvas generation + 2-way voice communication)
Part III: Enterprise – Production build-out for a medium-sized business with 100 to 500 concurrent users

As we advance through the stages we’ll deploy enterprise tools. These include deployment and cybersecurity tools such as Ansible, VMs, Wireguard and a commercial AI-powered Next Generation Firewall (NGFW).

# 1. update OS & Kernel
sudo apt update && sudo apt full-upgrade -y
sudo apt autoremove -y
sudo reboot

# 2. update nvidia drivers for headless, AI workloads
sudo ubuntu-drivers install --gpgpu
sudo reboot

# 3. manually install nvidia utility package
major_ver=$(modinfo -F version nvidia | cut -d'.' -f1)
sudo apt install nvidia-utils-${major_ver}-server 

sudo reboot

In this phase we achieve software isolation, dependency management, and configuration portability. Install Docker & Nvidia Toolkit. We use Docker so that tools like Dify don’t affect system files. Nvidia’s toolkit allows docker to “talk” to your GPU.

# Install Docker using the official convenience script
# curl -fsSL https://get.docker.com -o get-docker.sh
# sudo sh get-docker.sh
# Or install docker engine and docker compose from official ubuntu packages
sudo apt install docker.io -y # https://stackoverflow.com/a/57678382
sudo apt install docker-compose-v2 -y # 

#Download and install Nvidia's Docker container kit
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg
curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
  sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
  sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
#  Fix the Architecture Variable
sudo sed -i "s/\$(ARCH)/$(dpkg --print-architecture)/g" /etc/apt/sources.list.d/nvidia-container-toolkit.list


# Enable GPU access for Docker
sudo apt update && sudo apt install -y nvidia-container-toolkit
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker # sudo systemctl enable --now docker

# Add user to docker group (avoid sudo for docker commands)
sudo usermod -aG docker $USER
newgrp docker # "shortcut" that refreshes group memberships for current session so you don't have to log out

Create a folder called ai-server and create a file named docker-compose.yml.

This is boss-mode. Use this for maximum flexibility:

# Directory Structure
mkdir -p ~/ai-stack/mvp/data/ollama ~/ai-stack/mvp/data/webui
cd ~/ai-stack/mvp && nano docker-compose.yml

Next create and save the following docker file:

# File path: ~/ai-stack/mvp/docker-compose.yml
services:
  ollama:
    image: ollama/ollama:latest
    container_name: ollama
#  Useful for external access (e.g., for API calls from outside the Docker network).
#  When used internally (e.g., by Open WebUI), not exposing the port is more secure.
#    ports:
#      - "11434:11434"
    pull_policy: always # disable for production
    restart: unless-stopped
    tty: true # for add'l debug info
    volumes:
      - ./ollama_data:/root/.ollama
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1  # may also set to 'all' too
              capabilities: [gpu]

  open-webui:
    image: ghcr.io/open-webui/open-webui:main
    container_name: open-webui
    ports:
      - "3000:8080"
    environment:
      - OLLAMA_BASE_URL=http://ollama:11434
      #- RAG_EMBEDDING_ENGINE=ollama
      #- RAG_EMBEDDING_MODEL=nomic-embed-text:latest # Consider changing this for Part 2 or 3
      #- ENABLE_RAG_HYBRID_SEARCH=True
# If you need super strong security.  
#     - WEBUI_SECRET_KEY=change_me_to_random_string
# bypass login page for a single-user setup, set the WEBUI_AUTH environment variable to False
      - WEBUI_AUTH=False
    volumes:
      - ./webui_data:/app/backend/data
    depends_on:
      - ollama
    restart: unless-stopped

Run: docker compose up -d then download Qwen3 with:

docker exec -it ollama ollama pull qwen3:30b-a3b-instruct-2507-q4_K_M

and the embedding model with

docker exec ollama ollama pull nomic-embed-text

NB: Docker compose commands like docker compose up or docker compose down must be run from the directory where your docker-compose.yml file is located.

docker exec -it ollama: Runs a command inside the ollama container.-it spawns an interactive session that shows download progress.
ollama pull qwen3:30b-q4_K_M: Downloads the Qwen3 30B model with q4_K_M quantization

Alternatively you may download Unsloth’s version: sudo docker exec -it ollama ollama pull hf.co/unsloth/Qwen3-30B-A3B-Instruct-2507-GGUF:UD-Q4_K_XL.

The disadvantage is that Unsloth does not update its quants for new updates of the same model. For example Quen3-30b-A3-instrcut was updated in September 2025. But, Unsloth still uses the July 2025 version for its quant.

Unsloth quantization algo (UD-Q4_K_XL) results in smaller (17GB vs 18GB) but more accurate model than the standard (e.g., Q4_K_M). See Daniel Han’s reply here.

NB: The open source LLM community often uses the word “quant” as shorthand for quantization (the process) and quantization algorithm (the method or algo type: Q4_K_M, Q5_1, UD-Q4_K_XL). For example: I’m using the Q4_K_M quant for this model. or Unsloth’s UD quants are optimized for accuracy.

GLM-4.7-flash will be used in part 2. However, You may download Unsloth’s version with: docker exec -it ollama ollama pull hf.co/unsloth/GLM-4.7-Flash-GGUF:UD-Q4_K_XL.

View downloaded models:
```
docker exec -it ollama ollama list
```
List downloaded models:
```
docker exec -it ollama ollama list
```
Delete a model:
```
docker exec -it ollama ollama rm 
```
check logs: docker compose log

Docker security: Running containers as root (default) is a security risk. Consider adding a non-root user in the Dockerfile or using user: “1000:1000” in docker-compose.yml
Open WebUI auth: For production, strongly recommend enabling WEBUI_AUTH and setting a WEBUI_SECRET_KEY

RAG functionality is built into webUI. No additional action needed for now.

Resources and References

https://huggingface.co/collections/Qwen/qwen3 https://huggingface.co/Qwen/Qwen3-Omni-30B-A3B-Instruct https://huggingface.co/Qwen/Qwen3-30B-A3B-Instruct-2507 https://ollama.com/library/qwen3/tags https://ollama.com/library/qwen3:30b-a3b-instruct-2507-q4_K_M https://documentation.ubuntu.com/server/how-to/graphics/install-nvidia-drivers/#installing-the-drivers-on-servers-and-or-for-computing-purposes https://docs.ollama.com/linux#install https://docs.openwebui.com/getting-started/quick-start/ https://docs.openwebui.com/getting-started/quick-start/starting-with-ollama/ https://community.home-assistant.io/t/my-journey-to-a-reliable-and-enjoyable-locally-hosted-voice-assistant/944860

LLM Quantization

Code Sport — Mon, 02 Feb 2026 04:00:18 +0000

The purpose of quantization is reducing VRAM requirements while maintaining LLM accuracy and inference speed. For example, the compression algorithm NF4 (Q4-NF4) is a model best suited for models with 14B or more parameters.

In LLM quantization, the “bits” (e.g., 4-bit, 8-bit) represent the precision used to store model weights, determining memory usage and computational efficiency. 4-bit quantization reduces memory by ~75% compared to 16-bit (roughly 0.5 bytes per parameter vs 2 bytes), enabling smaller hardware to run large models with minimal accuracy loss (<1–2%).

Common options for GGUF are in the table below.

Algorithm	Bits	Pros	Cons	Notes
Q4_0	4-bit	Very small VRAM footprint, fast	Slightly lower accuracy on reasoning / math	Default for most local GPU setups
Q4_1	4-bit	Slightly better accuracy than Q4_0	Slightly more VRAM	Good for agentic reasoning
NF4 (Q4-NF4)	4-bit	Optimized for FP16-like performance	Requires GGUF model converted for NF4	Recommended for larger models (14B)
Q5_0 / Q5_1	5-bit	Better accuracy; still small	Higher VRAM	Only if you want slightly higher quality reasoning
FP16 / FP32	16/32-bit	Max quality	VRAM-heavy	Not practical for 14B+ on RTX 3090

Apple Silicon Security Architecture & Physical Layout

Code Sport — Sat, 10 Jan 2026 23:39:48 +0000

The diagram below clarifies that the Operating System (Kernel) is stored on the physical NAND (SSD), but is managed by a controller inside the SoC.

+-----------------------------------------------------------+          +----------------+
|                   APPLE SILICON (SoC)                     |          |  INTERNAL SSD  |
|                                                           |          |     (NAND)     |
|  +-----------------------+       +---------------------+  |          |                |
|  | Application Processor |       |   Secure Enclave    |  |          | +------------+ |
|  |    (The Main CPU)     |       |      (SEP)          |  |          | | iBoot (S2) | |
|  |                       |       |                     |  |  Secure  | +------------+ |
|  |  +-----------------+  |Mailbox|  +---------------+  |  |  Channel | | XNU Kernel | |
|  |  |  XNU (In RAM)   | <=======>|  |    sepOS      |  | <---------> | +------------+ |
|  |  +-----------------+  |       |  +---------------+  |  |          | | User Data  | |
|  |  |  App Sandbox    |  |       |  | AES Crypto Eng|  |  |          | +------------+ |
|  +-----------------------+       +---------------------+  |          +----------------+
|              ^                              ^             |           Physical NAND
|              |                              |             |         (Encrypted at Rest)
|      +---------------+              +---------------+     |
|      |   Boot ROM    |              |   UID Key     |     |
|      | (Root/Trust)  |              | (Silicon Fuse)|     |
|      +---------------+              +---------------+     |
+-----------------------------------------------------------+

Figure 1: Apple Silicon Security Architecture & Physical Layout

The UID (Unique ID) is a physical AES-256 key fused into the SoC during manufacturing. It is the “Master Secret” of the chip.

Why is it “randomly” there? It provides Hardware-Bound Encryption. Because the SSD data is encrypted using a key derived from this specific UID, you cannot desolder the SSD chips and read them on another Mac. The data is cryptographically married to that specific processor.

The SEP takes the UID.
It combines it with your Passcode.
It generates a KEK (Key Encryption Key).
This KEK is sent to the AES Engine to decrypt the SSD.

Stage 1 (Hardware): The Boot ROM (inside the chip) starts. It knows nothing about the SSD yet except how to talk to the NVMe controller.
Stage 2 (Loading): Boot ROM reads iBoot from the SSD. It checks the signature. If valid, it loads iBoot into the SoC’s internal RAM.
Stage 3 (Kernel): iBoot now looks for the XNU Kernel on the SSD.
Stage 4 (Execution): iBoot verifies the Kernel’s signature, loads it into the system’s main RAM, and the OS begins to run.

Component	Physical Location	Identity / Role
Boot ROM	Inside SoC (Silicon)	Immutable Root of Trust.
UID Key	Inside SoC (Fuses)	The silicon’s unique cryptographic thumbprint.
Application Processor	Inside SoC (Logic)	The Main CPU (Runs macOS and apps).
iBoot / Kernel	On the SSD	The software that manages the hardware.
Biometrics	SEP Protected RAM	Isolated fingerprint/face data storage.

Protected: The AI Agent Stack

Code Sport — Tue, 28 Oct 2025 02:26:56 +0000

Quantitative Finance and Numerical Methods

Code Sport — Tue, 28 Oct 2025 01:08:06 +0000

Time series analysis, regression, and PCA can be considered types of machine learning, while hypothesis testing and cohort analysis are statistical methods that support and inform machine learning processes. The distinction lies in whether the technique is primarily used to train a predictive model (ML) or to analyze and interpret data (statistical)

Time Series Analysis: When used for forecasting future values based on historical, time-ordered data, it is a type of supervised machine learning. Some of the most effective methods for time series forecasting are machine learning models, including neural networks and gradient boosting frameworks.
Principal Component Analysis (PCA): This is a key technique in unsupervised machine learning. PCA is a dimensionality reduction method that finds new, uncorrelated variables (principal components) that capture the most variance in a dataset. It is often used to preprocess data for other machine learning algorithms by reducing model complexity and preventing overfitting.
Regression: This is a fundamental supervised machine learning technique. It predicts a continuous numerical output based on input features and works by estimating the relationship between variables. A regression algorithm is used to train a machine learning model, which learns to make predictions based on labeled data.

Hypothesis Testing: This is a statistical method used to evaluate assumptions about a population based on sample data. In machine learning, hypothesis testing can be used to compare different models, evaluate the significance of features, and validate assumptions about data distribution, but it is not a machine learning technique itself.
Cohort Analysis: This is a statistical and analytical technique for studying the behavior of different groups (cohorts) over time. While it provides insights that can inform machine learning models (e.g., identifying valuable customer segments), it is a form of descriptive or diagnostic analysis, not a machine learning technique.

Used diffusive stochastic processes (including geometric Brownian motion with local volatility) and numerical methods including: Monte Carlo, finite difference methods) to estimate the fair value of equity derivatives

Diffusive stochastic processes: These are mathematical models used in quantitative finance to simulate the random movement of asset prices over time.
Geometric Brownian Motion (GBM): A specific type of diffusive stochastic process commonly used in the Black-Scholes model for option pricing.
Local Volatility: A more advanced model that allows the volatility of an asset to change over time and with respect to the asset’s price level.
Monte Carlo Simulation: A numerical technique that uses random sampling to model the behavior of financial assets and find approximate solutions to complex problems, like valuing options.
Finite Difference Methods: Numerical methods for solving differential equations, which are fundamental to pricing derivatives.

Analyzed risk on a portfolio of options using stochastic volatility models (Local Volatility, Local-Stochastic Volatility)

Stochastic Volatility Models: These are used to model the fact that the volatility of an asset, like an option, is not constant but changes over time in a random way.

Machine learning methods (regularization, clustering algorithms) applied to financial data sets

Regularization: Techniques like L1 (Lasso) and L2 (Ridge) that are used to prevent overfitting in a machine learning model by penalizing large coefficients.
Clustering Algorithms: Unsupervised machine learning methods, such as K-means or DBSCAN, used to group similar data points together based on their features.

Predictive modeling: The process of using statistical and machine learning techniques to forecast future outcomes.
Classification: A supervised machine learning task for predicting a discrete class label (e.g., spam or not spam) for a given data point.
Feature Engineering: The process of creating new input features from existing ones to improve the performance of machine learning models.

Applied statistical methods (including Principal Component Analysis (PCA), linear and logistic regression, time series analysis) to financial data sets

Principal Component Analysis (PCA): A statistical and unsupervised machine learning technique used for dimensionality reduction.
Linear Regression: A supervised learning method used to model the relationship between a dependent variable and one or more independent variables.
Logistic Regression: A supervised learning method used for classification problems where the goal is to predict a categorical outcome.
Time Series Analysis: The analysis of time-stamped data to identify patterns, trends, and seasonality, often for forecasting.

Used std deviation (Ỽ), variance (Ỽ2), co-variance, correlation (r-value), linear regression, Q-Q Plots, hypothesis tests, p-value, cohort analysis to inform investment decisions from data

Descriptive Statistics:
- Standard Deviation, Variance, Co-variance, and Correlation: Measures of data dispersion and the relationship between variables.
Inferential Statistics:
- Hypothesis Tests and p-value: Used to make inferences about a population from sample data and determine the statistical significance of results.
- Q-Q Plots (Quantile-Quantile Plots): Used to check if a dataset follows a particular theoretical distribution, such as a normal distribution.
Analytical Technique:
- Cohort Analysis: A method for studying the behavior of a group of people (cohort) over a period of time.

Python (Pandas, NumPy, SciPy, Matplotlib, Seaborn, PyTorch), SQL, R, Excel VBA, Bash

Python Libraries (Pandas, NumPy, SciPy, etc.): Packages used for data manipulation, scientific computing, visualization, and machine learning.
SQL: A language used for managing data in relational databases.
R: A programming language and environment specifically designed for statistical computing and graphics.
Excel VBA: A programming language for creating macros in Microsoft Excel.
Bash: A command-line shell used for interacting with an operating system.

Python Frameworks & Libraries for RAG and Agentic AI

Code Sport — Mon, 27 Oct 2025 23:38:08 +0000

Here’s what’s actually being used in production right now:

Most popular framework for building AI agents
LangChain = basic chains and agents
LangGraph = advanced multi-agent workflows with state management

Specialized for multi-agent collaboration
Agents work together like a “crew” with roles

Multi-agent conversations
Good for code generation tasks
Rising popularity

Originally for search, now strong in RAG + agents
Good for production deployments

Lightweight agent framework

Google’s agent framework

Enterprise-focused agent framework
C# and Python support

Evolved from RAG-only to include agentic capabilities

Most popular for RAG pipelines
Document loaders, splitters, retrievers

Purpose-built for RAG
Better for complex document structures

Enterprise RAG pipelines
Good for production deployments

Most popular embedded vector DB

Vector search in PostgreSQL

Managed vector DB (cloud)
Popular in production

Open-source vector DB
Good for hybrid search

Scalable vector DB
Good for large deployments

Vector similarity search library
Not a full DB, but very fast

Handles PDFs, Word, HTML, etc.

Advanced document parsing
Handles complex layouts

PDF text extraction

Word document processing

HTML parsing (for web scraping)

Embedding models are a specialized type of machine learning algorithm designed to translate complex, unstructured data into numerical vectors. These “vectors” act as a list of numbers that represent the underlying meaning and relationships of the input data

Create embeddings locally
Hugging Face models

Via API (text-embedding-ada-002, text-embedding-3-small/large)

Via API

LangChain’s evaluation platform

RAG evaluation and observability

RAG evaluation framework
Measures faithfulness, relevance, etc.

LLM observability

ML experiment tracking

NumPy – Array operations
Pandas – Data manipulation
Pydantic – Data validation (important for agents!)
asyncio – Async operations (important for agents)

requests – HTTP requests
httpx – Async HTTP
FastAPI – Building APIs for agents
Gradio / Streamlit – Quick UIs for demos

promptify – Prompt templates
guidance (Microsoft) – Structured LLM outputs

Biotech Foundations: Modality, Delivery, Disease Target

Code Sport — Tue, 21 Oct 2025 07:51:33 +0000

Glossary of Terms in Clinical Research

These three terms are fundamental to understanding how a drug is conceived and developed:

Term	Definition	Context/Role in Drug Development
Disease Target	The specific molecular entity (e.g., a protein, enzyme, receptor, or gene) in the body that the drug is designed to physically interact with to produce a therapeutic effect.	This is the scientific hypothesis. For the drug we’ve been discussing, the target is IL-17A and IL-17F (two specific inflammatory cytokine proteins).
Modality	In Biotech all modalities are biological molecules A biological molecule is used to create therapeutic drugs The drugs are created using living organisms or biological processes These are called Large Molecules drugs. And differ from traditional, small-molecule, pharmaceutical drugs that are made through chemical synthesis	Describes the drug’s structure. Examples include: large-molecule proteins or antibodies, or smaller, more complex molecules like nucleic acids, cell therapies, and gene editing technologies, Small Molecule, Monoclonal Antibody (mAb), Cell Therapy, Gene Therapy, or, a Nanobody® (a small, single-domain antibody).
Indication	The specific disease, condition, or manifestation for which a drug has been, or is seeking to be, officially approved by a regulatory agency (like the FDA).	The disease or symptoms the drug treats. Often the endpoint in clinical trials. A single drug may have multiple approved indications

The Distinction: Modality vs. Drug Delivery

Understanding the difference between *what* the drug is (modality) and *how* it gets where it needs to go (delivery) is crucial in modern biotech.

Feature	Drug Modality	Drug Delivery System (DDS)
Core Definition	What It Is The active therapeutic agent itself—the type of molecule (e.g., antibody, mRNA, cell).	How It Gets There The mechanism or vehicle used to transport, protect, and release the drug at the target site.
Analogy	The ”cargo” or the therapeutic instruction within the package.	The ”truck” or ”vessel” responsible for transporting the cargo safely to its destination.
Examples	Monoclonal Antibodies (mAbs), Small Molecules, Antisense Oligonucleotides (ASOs), CAR T-Cells.	Lipid Nanoparticles (LNPs), Viral Vectors (AAV), Pegylation, Transdermal Patches, Orally Ingestible Devices.
Primary Goal	To interact with a specific biological target to achieve the therapeutic effect (e.g., block a receptor, replace a protein).	To improve stability, enhance bioavailability, increase targeting specificity, and control release kinetics.

Feature	Drug Modality (The Tool)	Disease Target (The Lock)	Drug Delivery System (DDS) (The Package)	Indication (The Goal)
Core Definition	The molecular or cellular class of the therapeutic tool. Defines the drug’s fundamental structure and mechanism of action.	The specific biological component the Modality interacts with to initiate a therapeutic effect.	The protective vehicle or formulation designed to transport, stabilize, and maximize the Modality’s exposure at the site of action.	The specific disease or condition the drug is approved or being developed to treat.
Analogy	The Specific Tool (Wrench, Scalpel, Blueprint).	The Specific Nut, Protein, or Cell that needs to be fixed or blocked.	The Container or Transport Vehicle (Pill bottle, LNP truck, Syringe).	The Patient’s Illness (e.g., A Fever, a Cancer, a Genetic Defect).
Key Examples	>Small Molecule (Tylenol), mAb, Nanobody®, mRNA, CAR-T Cell Therapy	An Enzyme (COX), a Cytokine (IL-17), a Viral Protein (Spike), or a Defective Gene.	LNP, Viral Vector (AAV), Tablet Coating, Injectable Depot Formulations.	Hidradenitis Suppurativa (HS), COVID-19, Acute Myeloid Leukemia (AML).

Relevance to Clinical Trials and the Biotech Industry

The choice of modality and delivery system fundamentally dictates the risks, costs, and timeline of a biopharma program.

Area	Impact of Drug Modality	Impact of Drug Delivery System (DDS)
Clinical Trial Design	Dictates primary endpoints, necessary safety monitoring (e.g., immunogenicity checks), and potential for curative (one-and-done) or chronic dosing.	Affects dosing frequency, patient compliance (oral vs. injection), and the need for specialized administration protocols (e.g., surgery for implanted devices).
Regulatory Strategy	Newer modalities (Cell/Gene Therapy) often lack regulatory precedence, requiring adaptive strategies and extensive CMC (Chemistry, Manufacturing, and Controls) packages.	Must prove the safety and long-term biocompatibility of the delivery vehicle. Can enable accelerated approval pathways by significantly reducing off-target toxicity.
Biotech Industry Focus	Defines a company’s core platform (e.g., “a protein degrader company”). High IP value focused on the mechanism of action.	Creates specialized platform technology focused on overcoming natural biological barriers (e.g., blood-brain barrier). High value in improving bioavailability of existing modalities.
Manufacturing	Requires highly specialized and often extremely complex processes (e.g., live cell culture, viral vector purity/titer). High production cost per dose.	Focuses on large-scale, consistent, and sterile production of the formulation components (e.g., LNP formulation consistency and stability). Often easier to scale than the modality itself.

COVID-19 Vaccine Technologies: A Modality Comparison

These three vaccines illustrate two different modality and delivery approaches to achieving the same goal: immunity against the SARS-CoV-2 Spike (S) protein.

Vaccine (Company)	Modality (What is the drug?)	Delivery System/Vehicle	Route of Administration	Drug Target/Mechanism
Comirnaty (Pfizer/BioNTech)	mRNA Nucleoside-Modified Messenger RNA	Lipid Nanoparticle (LNP)	Intramuscular (IM) Injection	Instructions for human cells to produce the SARS-CoV-2 Spike (S) protein.
Spikevax (Moderna)	mRNA Nucleoside-Modified Messenger RNA	Lipid Nanoparticle (LNP)	Intramuscular (IM) Injection	Instructions for human cells to produce the SARS-CoV-2 Spike (S) protein.
Nuvaxovid (Novavax)	Protein Subunit Recombinant Protein Subunit	Nanoparticles with a Saponin-based Adjuvant (Matrix-M)	Intramuscular (IM) Injection	The actual, pre-made SARS-CoV-2 Spike (S) protein (antigen) is delivered directly to trigger an immune response.

Key Takeaways on Modality

mRNA Vaccines (Pfizer/Moderna): Use genetic instructions (mRNA) wrapped in fat bubbles (LNP) to teach the body’s cells how to make the antigen.
Protein Subunit Vaccine (Novavax): Uses traditional vaccine technology by injecting a non-infectious, pre-made piece of the virus (the spike protein), boosted by an adjuvant.
Route of Delivery: All three utilize the same common route: an intramuscular injection.

Code Sport I/O

Comparing Embedding Models

Embedding Model Comparison

For Long Documents

For Multilingual Tasks

For Maximum Performance

For Speed / Low Resource

Enterprise-Ready GenAI: Determinism is the Priority

Defining Deterministic Systems

Use Cases: Generative AI for Expert Systems

Defining Expert Systems

Use Cases for Deterministic AI

Auditability: Consistent, Deterministic Outputs Supports Governance Risk, and Compliance

Key Benefits for Organizations:

Optimal Model Parameters to Enforce Deterministic AI

Critical Nuance: The “Hidden” 100%

Why Penalties are “0” for Determinism

Quality Assurance Note

Inference Parameter Optimization for Qwen3-30B-A3B-2507-instruct

1. Diagnostic Verification & Inspection

2. The Three Sources of Parameter Data

2.1 The Ollama Modelfile (Baseline)

2.2 Open-WebUI Settings (Master Interface)

2.3 Qwen3’s Recommendations (Target)

3. Subtle Influences of Sampling Logic

3.1 Temperature Nuance

3.2 Top_P (Nucleus) Nuance

3.3 Top_K Nuance

3.4 Min_P Nuance

3.4 Frequency Penalty Nuance

3.5 Presence Penalty Nuance

Agentic Ai Server Build Tutorial Part I: The MVP

Project Overview and Introduction

Phase 1: Update Packages and Nvidia GPU Drivers

Phase 2: Set-up Containerization

Phase 3: Dockerizing the AI Stack

What This Command Does

Quantized Alternatives of the Same LLM: Lower VRAM Usage with More Accuracy

Model Housekeeping

Docker Commands

Security Notes

Phase 4: MVP of RAG On Uploaded Documents

Resources and References

LLM Quantization

Apple Silicon Security Architecture & Physical Layout

The SoC vs. SSD Relationship

The “UID” Confusion Resolved

The Workflow

Physical Boot Sequence (The “Hand-off”)

Summary of Storage Locations

Protected: The AI Agent Stack

Quantitative Finance and Numerical Methods

Types of machine learning

Statistical methods that support machine learning

Quantitative Finance and Numerical Methods

Machine Learning

Statistical Methods

Programming and Data Analysis Tools

Python Frameworks & Libraries for RAG and Agentic AI

Agentic AI Frameworks (Orchestration & Multi-Agent Systems)

Tier 1 – Production-Ready (Most Common)

LangChain / LangGraph (Industry Standard)

CrewAI (Industry Standard)

AutoGen (Microsoft) (Production-Ready)

Haystack (Production-Ready)

Tier 2 – Emerging/Specialized

smolagents (Hugging Face) (Emerging)

Google ADK (Agent Development Kit) (Emerging)

Semantic Kernel (Microsoft) (Emerging)

LlamaIndex Workflows

RAG Frameworks & Libraries

Core RAG Orchestration

LangChain (Industry Standard)

LlamaIndex (Industry Standard)

Haystack (Production-Ready)

Vector Databases (Storage & Retrieval)

ChromaDB (Industry Standard)

pgvector (PostgreSQL extension) (Industry Standard)

Pinecone (Production-Ready)

Weaviate (Production-Ready)