DB Performance

Business Values: Simplicity

2021-06-15T11:29:00.006+01:00

For the longest time, I felt that simplicity was either underrated or entirely unmentioned in business's mission and value statements.

The first person I saw promoting simplicity was Steve Jobs. He was known for meticulously designing and expecting that Apple products be made as simple as humanly possible. However, even after everyone has had an Apple product in their pocket at one point in their life, businesses still find the concept of simplicity to be so elusive or perhaps vague that it somehow fails to considered as an important value.

I would like to take this opportunity to explain conceptually, why I think simplicity should be a value for your business and what a focus on simplicity could result in.

In the early 2000s, I heard a line from a conference about software development: Simple isn't Easy. I took that statement to mean that making something complex into something, is hard. You need to take the time and mental effort to make a complex concept into something that is understandable and digestible.

The result should be that you and others only need to comprehend the essence of the concept and then can competently use it.

It is a considerable task, but one that moves us forward in our complex world. We need to simplify complexity just so that we can see the way forward. Once the path is clear, things are.. well, simple. And we as humans, like simple.

In fact, we find a high degree of complexity destructive. The archeologist Joseph Tainter came up with a thesis in his book “the Collapse of Complex Societies”. He argues that societies like the Ancient Romans, Egyptians and Mayans collapsed due to complexities that were not able to be addressed at the time with a lack of people who could be able to read, do math and keep up with complex laws.

Complex product design can kill too. I recall in University reading about a medical radiation machine that had such a complex array of buttons on its interface that operators gave patients x300 more radiation than was required (the Therac-25 in 1986).

So what we need to do is remove complexity where we can - certainly not add it - and abstract complexity where we cannot.

Unfortunately, we sometimes do this process wrong. In that same period in the early 2000s, there was a movement in the software development community that preached 'optimising for developer happiness'.

During this time, some innovations came around that abstracted away - a form of simplification - the developer's need to understand what the underlying hardware was doing. This let the industry be able to focus on simplifying concepts in the software layer itself. Developers would only need to focus on recurring patterns that solve familiar business or software requirements and do away with the consideration of how that software would need to eventually run on the hardware.

While this abstraction and simplification did optimise for developer happiness, the resulting software was not optimised to run on hardware. Some new developers did not even learn how the hardware works “under the hood”.

This lead to many years of lost productivity fixing (now legacy) applications to work better with hardware. All the while competing with time allocated for higher business priorities to release new product features.
Fortunately, today - and only recently - software developers are realising that in order to simplify a complex concept, you need to consider all its relevant moving parts. There is a new movement using a new language that is considered to have a 'hardware/software co-design'. It has shown that once you really understand a concept and its scope, you can come up with innovative (“zero cost”) abstractions that make the entire industry move forward.

Lastly, I won't be able to end this post that started with Steve Jobs without talking about simplifying your focus. Jobs was known to relentlessly discard to-dos and ideas that were not relevant to the most essential and beneficial thing he could do. My understanding is that he practiced this and encouraged his colleagues to as well, until it was painful. Once you and your team are focused on a very narrow and simple outcome, the chances of achieving it increase dramatically.
So, in conclusion and keeping inline with the theme:

Simple isn't easy.
Complexity kills.
Zero cost abstractions.
Simplify your focus.

Reducing RDS costs

2021-06-11T10:35:00.006+01:00

I wrote an article about reducing RDS costs. Usually, for MySQL RDS, but can apply more generically. Hopefully, you will find it useful.

How to Evaluate which Concepts in Tech are Good

2020-05-25T21:34:00.000+01:00

For a new concept in tech to be considered 'good', in my opinion, it would have to have the following characteristics:

Helps the Developers

The concept has to help the developers of tech systems, do their job faster, smoother and/or easier. "Optimize for developer happiness".

Helps the Hardware

The concept has to help the underlying hardware complete the work it needs to, faster, smoother and/or easier with less resources. You cannot ignore reality and whatever runs on the hardware is reality.

Helps the Business

The concept has to help the business run faster, smoother and/or easier. This can also be seen as the end customer get better value.

Helps Make Things Simpler

The concept has to thoughtfully remove complexity from tech systems. If they cannot remove it fully, then they need to abstract it with zero or few costs and few interface points. "Simple is not easy".

Integrates with Other Concepts

Does not contradict other existing concepts or if it does, are those other concepts 'good' and on what premises were they formed?

How to Make MySQL Cool Again v3.0

2020-02-01T20:08:00.000+00:00

In-built columnar storage engine that works next to Innodb on the same DB server.
Materialized views - async and semi-immediate
Recommended indexes based on query usage
Rename CREATE PROCEDURE to CREATE DATABASE API. Yes, really.. just rename it.

How Do Software Systems Become Complex and What Can You Do to Prevent it

2019-10-14T16:12:00.001+01:00

Why does everything have to be so complicated?
Wouldn't it be nice for once, to have a simple and clean system that you can run or make small changes to?

But how is it that systems get complex to begin with and how can we avoid it reaching that stage?

In order to answer that, I need us to have a common language and explain a concept called "crow epistemology". Epistemology is the philosophical branch of the acquisition of knowledge. The crow part referres to an experiment done with crows many decades ago:

The experiment was conducted to ascertain the extent of the ability of birds to deal with numbers. A hidden observer watched the behavior of a flock of crows gathered in a clearing in the woods. When a man came into the clearing and went on into the woods, the crows hid in the tree tops and would not come out until he returned and left the way he had come. When three men went into the woods and only two returned, the crows would not come out: they waited until the third one had left. But when five men went into the woods and only four returned, the crows came out of hiding. Apparently, their power of discrimination did not extend beyond three units--and their perceptual-mathematical ability consisted of a sequence such as: one-two-three-many.

We humans, are also limited to the number of things we can hold in our head at any one time. Here lies the (human) issue with complexity. For us to make computer systems less complex, we need to take steps as to only allow for a small number of things or concepts to take up space in our brain at any one time.

Let's look at a few types of complexities:

Cyclomatic complexity

Cyclomatic Complexity is a quantitative measure of the number of linearly independent paths through a program's source code. It was developed by Thomas J. McCabe, Sr. in 1976.

IF statements. Too many IF statements make an application complex. Each 'path' of the IF statement needs to be 'rendered' in your brain to have an overview of what the applications will do. In extreme cases, you can often reach the infamous 'pyramid of doom'. If your brain can only hold (on average) 5-7 things, then you are looking at either two variables in the IF statements or three boolean variables (2 options - true/false - to the power of 3 is 8). Anything more than that, can be considered complex and will force people to stare at the screen for many minutes whenever they want to go over that code.

Decision Tables. An exception to this maybe a decision table, where those paths are 'pre-rendered' and are therefore slightly easier to understand. But even with decision tables, too many options and you find yourself going down one path at a time, tracing the screen with your finger.

Error Handling. A subset of IF statements can be included in Error Handling. Usually, you have the default way a class or function expects to get and process requests and when you include error handling into it, you get too many 'paths' and the code becomes messy.

Personally, I am interested to see how 'contract by design' works for this use case, by off-loading error handling into other parts of the code. If my predictions are right, this could be what replaces a large chunk of Unit Tests in the future.
It is also, philosophically more inline with the original intent of Object Oriented programming. A human can 'run' and 'eat', but a human also has limits on what it can eat and on what surfaces it can run. Specifying those constraints helps reduce complexity in other parts of the application.

Function has too many lines of code. It is simply difficult to understand what is going on. Maybe the original person who wrote it can understand, but not anyone else that need to make changes to that code.
A class has too many functions

A class has too much logic in it. Try using a Decision Object

A class has too many dependencies. This overloads your cognition in a similar way to IFs, because you need to 'render' the dependencies to get an overview of what is going on, in your brain.

Too many parts in your system. Too many moving parts, makes it difficult to figure out where issues are, as a general rule.

Too many options for communicating with an API or interface or cli. This isn't very obvious, but too many options is both difficult to develop and maintain, but also difficult for the user to understand how to use. It also makes it a more complicated dependency to interact with and test for.

Too many buttons on your website. A corollary of point 6 is a busy website that is too complicated to understand how to use.

As a side note, trying to solve a problem that has too many possible decisions to make, also counts as complexity for your brain. In cognitive science (but more math, really), it is call combinatorial explosion.

Castles on Quicksand

Now that we have covered what humans might consider complexity, let us consider other forms of non-human complexity. Apart from making our code 'clean and simple', we sometimes need to factor in more parts of the terrain. Specifically, we cannot isolate ourself to just making the code aethstically pleasing and not question how the code would run on the metal underneath. How do the physics of of it work, at least in principle? How do we move 0s and 1s as fast as possible without causing bottlenecks?

In philosophy, we call this 'evasion of reality' and its becoming more and more common in the age of cloud computing - although, to be fair, the cloud pretty much plays a 'cha-ching' sound whenever you do this.

Let's take a couple of examples:

Flooding your database with single insert connections, instead of batching writes to it. Batching is the multi-threadiness of databases
Array of Objects or Object of Arrays for performance. The gaming industry takes a more data-oriented design approach to get better performance as well as work on less powerful machines like mobile phones. Using Structs of Arrays, they are able to render more moving units on a screen with far less CPU cache misses.
The Rust programming language using various principles for memory usage instead of a garbage collector. Rust uses innovative methods that help humans code without a garbage collector while making it a lot easier to manage lifetimes and data races.

I would like to focus on the last example: in order to build efficient and clean computer systems, we need to use the principles discussed to make our code less complex AND integrate them with principles about how computers work best under the hood. Similar to how Rust does it - integration is the key.

Once you have principles that consider both code complexity and system performance, you develop much faster, your code is simpler and you do not need to revist it in the future to make 50 more commits just to make it go fast. After 50 additional commits, nothing looks simple, anyway.

Pragmatic Entropy

Lastly, we need to make sure that our system has integrity. Here, I mean that it was done right with core values and principles and that those were not deviated from during the development process. I mean here, to not let too many cooks spoil the broth and in particular, exciting spices from online blogs or conferences.

If you do not keep a watchful eye for this, your system can get complex and random very quickly. If you have ever attended a meeting were this was raised "why don't we do this? it's what all the cool kids use", then you will know what I mean.

The problem with these situations, is that if you compromise (be pragmatic), you have already lost. You try to rush things out without hurting other people's feelings and soon you will get those annoying conversations with QA or the people who review your code that go "why didn't you just do X instead? It is a lot simpler".

The other cooks need you to compromise or they can't do anything. What do I mean?

If you were to say "this is a bad idea, I wont do it." - they have lost and the integrity of the software system is safe (assuming you are following the right values and principles)
If you were to say "I agree to do this idea, but tell me exactly how to implement it" - they have lost, because they have no idea how. They need to persuade you to do it and for you to implement their idea.
If you compromise, they have fully won. They get the latest technology into the system and you have to implement it, test it and maintain it going forward. You have at that point, introduced complexity into the system.

If you ever read someone's code and ask "why did they do it this way?" or the more common "wtf", it was probably because someone compromised.

In Conclusion

For those of you that may have missed it, we have covered complexity in computer systems, but we have also done so in a complete philosophical framework. We have Epistemology (Cyclomatic Complexity), Metaphysics (Castles on Quicksand) and we have Ethics (Pragmatic Entropy) - we also have a slight mention of aethstics.

I hope you have enjoyed it and that it helps you in your implementations.

Top 5 Reasons why you need a Data Expert in your Agile/Cloud-only Company

2018-11-19T11:46:00.001+00:00

Setup blue/green database environments to streamline database deployments
Help your team transition from a monolithic database to a microservice environment
Setup testing environments with actual data to test against (eg, docker database containers with anonymised data)
Improve the performance of your production databases
Setup and populate reporting Data-warehouses/Data-lakes for fast and readily available analytics

Worst 5 Things that can happen if you don’t have a Data Expert in your Agile/Cloud-only company

Your production databases slow down your website/mobile apps so much that the developers want to take 3 months to off to migrate sensitive parts of your application to 3 new database technologies. You now not only need to maintain unfamiliar database technologies, you also need to fix support tickets like “why does it say the total is 50 here, but the total is 43 over there? Which one is right?”.
Every time you deploy database changes after a few extra weeks of testing, everyone still holds their breath while it deploys.
After a couple of years of hard work trying to move your monolith to microservices, only 5-10% of the database has been migrated. The momentum has stopped, because no one is willing to risk their positions and break key parts of the system.
Your developers still need to test against the staging environment as that is the one that has data in it, but by then, all the bad database queries have already been written.
After having 1 monolithic database and 20 silo’d databases sitting inside microservices, you now need to hire a data expert to go over all the mess and create a single place to run reports from. Otherwise, no one has an idea what’s going on.

Simple Database Design Rules

2018-10-16T14:30:00.001+01:00

Remove aggregated data from non-aggregated tables.
Normalise and only denormalise with parent-keys (key enrichment)
Consolidate new and small DB instances to a large DB instance in separate schemas
Use DB Views for cross service data reading
Use compound primary key when auto_inc id is not needed.

How to Implement Technical Change in an Organisation

2018-05-01T11:28:00.000+01:00

Your time is short and valuable, so I will not waste it on fluff. You would, however, need to take on these points and do more research on them, in the event that you like and agree with them.

You are in a company and you see a glaring issue that you would like to solve. You have an idea or some experience that your solution might be helpful. However, its frequently difficult to implement change in companies (there are entire books and MBA courses on it). Sticking to IT/technical problems, here are 3 points that can help you implement technical change in an organisation.

Social Capital

This could be 'understand politics' or it could mean 'being friendly' or it could mean 'being consistently reliable and hardworking'. Either way, you can implement change by spending your social capital. Please note: you will not get this social capital back, even if your idea works amazingly well.

Expertise

If you are the expert in the area and you want to implement technical change, the resistance to your idea may be greatly reduced. Personally, I like giving presentation to people inside the company. Then if I want to make a change, the conversation would usually go this way: "Hey, do you remember that presentation I did a while back? I was thinking of implementing one of the points I had in there. It wont take too long and I will make sure it works." "What presentation? oh that one.. yeah, yeah.. sure. Just let me know when X is done and Y is finished".

De-Risk

Usually, a lot of the resistance to change is due to the risk of something going wrong. If your idea carries certain unacceptable or high risk, try to reduce the scope of the change or remove some of the moving parts. The idea is to still implement the 'core' of the technical suggestion or break it down into steps - where if the first step succeeds, then the second step would be less risky.

MySQL Compression Olympics

2018-04-04T09:22:00.000+01:00

And the results are in:

Innodb (no compression/original) - 660Gb
RocksDB - 209Gb
TokuDB (snappy) - 144Gb
TokuDB (LZMA) - 67Gb

Benchmark performance with mysqlslap on production sample queries :
(8-9 Very quick SELECTs + 1-2 medium SELECTs)

Innodb (original)
Benchmark
Avg: 0.100 seconds
Min: 0.091 seconds
Max: 0.182 seconds
Total: 5.101s

TokuDB (snappy)
Benchmark
Avg: 0.100 seconds
Min: 0.089 seconds
Max: 0.183 seconds
Total: 5.106s

RocksDB
Benchmark
Avg: 0.113 seconds
Min: 0.104 seconds
Max: 0.164 seconds
Total: 5.730s

TokuDB (LZMA)
Benchmark
Avg: 0.099 seconds
Min: 0.090 seconds
Max: 0.155 seconds
Total: 5.037s

Testing Platform:
Platform | Linux
Release | CentOS release 6.8 (Final)
Kernel | 2.6.32-642.11.1.el6.x86_64
Architecture | CPU = 64-bit, OS = 64-bit
Threading | NPTL 2.12
Compiler | GNU CC version 4.4.7 20120313 (Red Hat 4.4.7-17).
SELinux | Disabled
Virtualized | VMWare
Processors | physical = 2, cores = 4, virtual = 4, hyperthreading = no
Speeds | 4x2299.998
Models | 4xIntel(R) Xeon(R) CPU E5-2699 v3 @ 2.30GHz
Caches | 4x46080 KB
Memory | 15.6G

Observations - Load during conversion:
TokuDB snappy - Load 1.07, IOPs (around) 30mb/s
RocksDB - Load 1.09, IOPs (around) 50-70Mb/s
(There seem to be data load round and then a second round of compression afterwards)
TokuDB LZMA - Load 3-4, IOPs (around) 7mb/s

Competency, Simplicity and Transparency - Pattern

2018-02-20T12:57:00.000+00:00

Abstract

A large number of IT organizations today are monodisciplinary. This detracts from their ability to provide well-crafted products and use best-practices that exist from other disciplines. Wheels need to be reinvented and messy workarounds seem to trudge the IT organisation along.

There is a historical reason that IT organisations have ended up this way; departmental fiefdoms, communication issues and bureaucratic red tape.

The solution to this would be to bring in experts from other disciplines and set a framework that highlights competency, simplicity and transparency to integrate all the expertise and produce high quality products.

Inspired by the philosophy of John Ruskin and the Guild of St. George.

1. Introduction

My name is Jonathan. I have been working for 11 years, trying to improve the performance of systems that use databases. Through that experience (and with observing leading people in my industry), I have developed a knack for viewing everything as a system and then identifying bottlenecks within that system.

As of the middle of last year, I have started to use this knack and apply it to human systems at work. I have also studied intensively some concepts from: psychology, philosophy, political theory, social systems, economics and business strategy.

After noticing some short comings that began to increasingly frustrate me at work and in the spirit of 'don't just complain, try to fix it', I have come up with a system of organising work in IT organisations that I have given a lot of thought to.

I plan in this post (or white paper) to explain some shortcoming with our current way of working in IT and a possible future or improvement to those systems.

2. In the Beginning

IT organisations or the IT department within organisations, typically used to look like the diagram above. You would have Developers, QA, Database Administrators, System Administrators and Network Administrators. Some companies still have this same structure with slightly different divisions.

Over time, problems with this structure emerged. The main one that I would say is that over-time, the objectives of the different teams diverged from that of the overall company to that of the priorities of the team. Meaning, they became fiefdoms or tribes and started warring with each other.

Not physically warring with each other. More like a sort of

Territorial protectionism: "This falls into our areas and we will decide whether to do it or not"
Resource allocation: "Team X needs us to do Y. It will take a lot of work and I can't be bothered with it now. I'll just tell them to write me a ticket and I'll put it in the backlog for a while"
Communication process creep: "I know that the ticket was sent 2 months ago, but I have not received the detailed documentation of what to do, nor do I have written authorisation from manager X and head of Y"

If you look at the above chart as a hierarchy or a social system, it would look like Feudalism.

2.1 Story: The Consultant

A Java consultant once joined a company for a 6 month contract with a similar Feudalistic structure. He asked the DBA team to give him an Oracle dev database so that he can develop what was asked of him. He wrote up a ticket and waited. After a while of not getting the database, he continued with other things and tried to compensate with what he had available. There was some back and forth between the heads of his department and he did mention the lack of a dev database in meetings.
However, the contract finished at the end of 6 months and he left the company. 1 month later, he received an email that his Oracle dev database was ready for him to use.

3. Rise of the Developers

Around the beginning of the first dot-com boom, small start ups became quite popular. In those start-ups, it was expected that developers, set up the entire system - what we call full stack developers, today. As those companies succeeded and grew, some chose not to split off responsibilities to the format of feudalist model, but instead decided to add more multi-skilled developers.

This produced the following and arguably the current model for small to mid-sized companies:

Now what you have is what I call a developer-centric IT company and if I were to pick a hierarchical structure for it, I would say Monarchy.

There are two phenomena that I can see that got us here: job compression and automation.

3.1 Job Compression

Job compression means that a company decided to restructure its processes to have fewer stages which reduces the need for wait time between stages.

The example above shows a mortgage approval process. There are 4 stages. Each stage is a person with different expertise and different authority. Between each stage, there 'work request' sits in that person's inbox until they can get to it. The combined processing time and queuing time is 18 days.

Job compression would give 1 person enough authority and expertise to make a decision on the approval process.

You have now reduced the time it takes to approve a mortgage from 18 days to 7 days. Note that this was largely accomplished by reducing the overall queue time.

3.2 Automation

As more developers needed to take care of more areas of expertise, they did so by using certain developer philosophies to solve problems and in this case used automation. This brought about certain innovations like Puppet, Chef and Ansible along side previous SysAdmin innovations like virtualisation and later, cloud computing.

You can now, using code, boot up a container of a web server with the all files, scripts and images and run a slew of black box tests against it to see if it fully works.

Accordingly, developers now take on several roles in the IT organisation:

Development
Business Analysis
Quality Assurance
Database Administration
System Administration (now DevOps)
Security
Data Engineering

However, it is difficult to hold all that information inside one's head and developers are using these automations as a crutch to progress with their original work. For example, you can download a few Puppet modules and install as well as begin monitoring a new high availability database. Unfortunately, you have now lost the expertise (in the company) of what is going on under the hood and how to fix issues when they occur.

Very few innovations have been made in the areas outside the realm of pure developing as there are fewer experts in companies to make those innovations.

For example, while we have automated processes for storing and managing database schema changes, we have not had any innovations with deploying dev/test/staging databases that contain actual data to test against. Nor can we use existing automated systems for managing schema changes when our production databases become too big.

There is a general 'uneasy' feeling when needing to make changes to systems we don't fully understand. This negates the 'safe to fail' environments which we use today to make innovations. We also tend to apply 'philosophies' that work in one area and to another. This is sometimes helpful, but other times detrimental.

3.3 Story: API vs Batch Process

I was involved in a data batching process that roughly required 200 million items to be processed through an existing API. Had that process gone through the usual way, it would have taken 64 days, with the average chance of crashing.

The idea to improve this process was to add more web servers and parallel the work into as many threads as possible. This is a common philosophy that developers have picked up due to limitations with the speed of cores on CPUs. As core speeds have not improved in 7 years, the only option to improve performance would be to split the work across a number of threads.

I identified that API spent the majority of its time making database calls and that ultimately, the bottleneck would be the hard disk IO and certain mutexes.

I recommended offloading part of the work to the database. This involved loading 200 million items to a temporary table in the database that took 7.5 minutes, using a single thread. The rest of the work still needed to go through the API and took 8 hours to complete. Had the whole process been applied against the database in an efficient manner, I would assume it would take up to 45 mins.

3.4 Story: Spread Out vs Push Down

A company had a batch process that took around 2 hours and had a detrimental effect on the website during that time. I configured the database to handle such loads better and brought the time down to 30 mins using 6 application servers. I rewrote the batch process to be more 'database friendly' (push down work to the database) and reduced the time down to 3 minutes and 1 application server.

4. Competency, Simplicity and Transparency - Pattern

So far, we have had a feudalistic hierarchy with issues with warring fiefdoms and fighting over company resourced. We had then given all the resources to one entity - monarchy, but we lost expertise and reduced innovation in certain areas.

How can we leverage more advanced governing systems like democracy and capitalism?

How can we move to an organisational environment where more individualism is valued and where people are able to thrive and do better work?

4.1 Competency

Skill is the unified force of experience, intellect and passion in their operation.

- John Ruskin

One element of Capitalism, is about accepting Pareto’s principle about how expertise is distributed in a population in one type of hierarchy. Instead of going against it (socialism), it is designed to create new hierarchies, more areas of expertise, to have more people at the top of different hierarchies.

This lends towards the idea of craftsmanship as well.

What could happen in the future is that IT companies can structure their teams based on competency-based hierarchies. Meaning, areas of specific expertise and philosophies which are exlusive to one particular domain, thus maximising results for the whole IT company.

Another benefit from expertise and craftsmanship can be found in economics. Economies of Scope is a term from the world of business. You have probably heard of Economies of Scale, where you have a few products and you try to have bigger factories and bigger machines to pump out the same product in large quantities which would mean cheaper costs.

For example, you can have a factory that makes 3 types of sandwiches. You purchase bigger machines and improve your processes as much as possible to make those 3 sandwiches as fast as possible and remove all possible waste.

Economies of Scope, on the other hand, is a system where you try to produce different and varied products at a cheaper price. For example, take Subway. You can go in one and produce a high variety of sandwiches at slightly higher price than if you would buy a prepackaged sandwich in a shop.

The idea with Economies of Scope is to break down the process of creating new products into sub-processes that have a very defined scope and then set up communication systems to co-ordinate between those defined processes as well as have some synergy between them.

4.2 Simplicity

A complex system is difficult to work with. It is also difficult to work in a mess. Now complexity doesn't exactly equal a mess, but both of them are not an ordered and organised system. So (complexity or mess) is Chaos and not Order, in this context.

For us to get to order, we need to simplify the system by organising the mess with rules. Too many rules, lead to complexity, so once there, we need to either remove unneeded rules or find patterns or philosophies to the rules and use those to simplify the system.

Art is not a study of positive reality, it is the seeking for ideal truth.

- John Ruskin

Once your system is simple - not a mess, not complicated and not complex - it has a 'clean' and 'this just looks right' feeling to it. This might be called the aesthetics of simplicity.

Similar to 'clean code' and 'clean architecture' this philosophy of aesthetics has an innate feeling in it that something is beautiful and right.

The Tea Room

I would like to include diagrams to this aesthetic. Systems diagrams, network diagrams, database diagrams, business logic/rules diagrams - these need to be included in the art of 'clean and simple'.

When those objectives are reached, the systems, network, databases and business logic/rules may also be clean and simple - to understand, use, operate and make changes to. Please give it a try and see if it instinctively makes sense to you.

4.3 Transparency

To see clearly is poetry, prophecy and religion all in one.

- John Ruskin

Transparency is ultimately, the best way to prevent fiefdoms from occurring. Fiefdoms usually silo and represent information to other parts of the company to benefit itself.

For example, lets say an unethical manager would like a talented individual to stay in their division. That manager can simply not promote that individual and even give negative reviews to keep them where they are.

If, however, HR had access to objective metrics about all the employees, they could see that that person produced good work and has been in there position for some years. They would promote that person before they move to another company.

Some metrics that help can be included in Transparency:

Time until first 100 lines of code (gitprime.com)
Complexity rating of class (PMD)
95% API response time
Average time for SEV2 tickets resolution
Orders per week
Website feature usage (clicks) per week
Usefulness of App feature - survey

5. New Roles

This framework has a definition for an old role: Managers and a new role which I felt should be included that I call: Technical/Business Analyst. Both are very important for the framework, so I will explain them now.

5.1 Technical Business Analyst

Business Analysts seem to be something that only large companies have and there has been some huge innovation in documenting and expressing business knowledge in the last 5 years. We all need to start using this skill set to explain and diagram requirements and business knowledge, no matter the company size.

Business Process Modelling Notation 2.0 and Decision Modelling Notation could well be the next innovation in bridging the dialog between business and IT.

5.1.1 Story: Requirements Diagram

I was trying out using decision tables to document requirements. I talked with the Product Manager and asked her to give it a try. She took a ticket that a developer quoted as taking 5-8 days to implement. She went over the requirements and built a decision table in excel. She then showed it to the original developer, who said: "If this is all that is required, then it should take 1-2 days to implement".

5.1.2 Story: Pyramid of Doom

I was working on a way to document technical processes. I went over some code and found an if-then-else "pyramid of doom" in it. I then tried to put the conditions from the code into a decision table. After I was finished, I showed it to the original developer and he instantly understood it and made a correction to the table. I then proceeded to tell the business analysts in the company that were extremely impressed that that developer understood it so quickly. Apparently, they have had difficulties communicating business requirements to him before.

In the old way BPMN 1.0, mapping a process would look something like this:

I am sure, everyone has ran into something like this glued to a wall in an office. It's not very clear what is going on.

What happens in BPMN 2.0 and DMN, is as follows:

Decision Table - Discount Decision

And then, the process mapping is simplified:

BPMN 2.0 - Notice the small square/hash icon in the discount decision

The magic happens in three different ways:

The business logic is captured in an easy to understand way for the business user (notice, its in Excel)
That same decision table is understood by the developer
The process mapping is now easy to understand and therefore easier to understand more parts of the system.

We've gone over the business side, but we can go a bit further and apply this same process mapping to the technical side:

DMN for a Technical Process

So when you go into the 'Process Order' task from the diagram above, you would goto a technical process diagram listed below:

DMN for a Technical Process

Technical Business Analyst should be the ones to go over both and create both of these types of diagrams and tables. This should achieve a couple of things:

Provide a counter-balance and due diligence to new business requirements: "I understand you would like this new feature. Could you please explain to me in detail what it is that you need?"
Reduce the time groups of developers spend next to whiteboards.
Reduce risk by using decision tables to notice scenarios that were not considered: "We have Active for CustomerStatus, but I don't see a scenario where the OrderStatus is suspended."
Reduce the meetings between developers and business users.
Reduce the scope that developers need to work on and increase focus on a specific task.
Create a system of business and technical documentation.

TBAs should spend time going over the backlog of tickets. This should increase the velocity of the team if the tickets are very well defined.

When a new ticket is taken on by the team, a developer and a QA engineer should pick up the same ticket: The QA should start writing functionality tests based on the scenarios in the decision table and the developer should write the code and test it against those tests.

This role should cover the following points from 'Boehm's Top 10 Software Defect Reduction list':

Finding and fixing a software problem after delivery is often 100 times more expensive than finding and fixing it during the requirements and design phase
Current software projects spend about 40 to 50 percent of their effort on avoidable rework.
About 80 percent of avoidable rework comes from 20 percent of the defects

In addition, this role should also prevent or at least greatly reduce cancelled projects or priority changes. I understand that these are extremely demoralising for developers.

Let us finish up by going over the framework values with this role:

Competency: This is a new role for most small-to-medium companies. It should streamline the development process by adding an expert into the right area and reducing the scope of work for other people in the company.
Simplification: Having easy to understand diagrams and documentation simplifies development work. TBAs should also identify parts of the system that could be simplified (value stream mapping) and suggest very specific and narrow work for technical debt.
Transparency: TBAs should make the whole system easy to understand for both IT and business users, outside of it.

5.2 Managers

I would like to start off with saying that managers do not equal team leaders. In the developer-centric companies, there are very few managers and there are mainly team leaders: developers that have been promoted to lead other developers.

Dilbert.com

It is no secret that people do not like managers that have no idea about their technical role. In addition, there was a study that determined that 65% of managers actually produced negative value for the company. On the other side, good managers produce huge value (Pareto Principle) for the company and it should not be something we write-off.

Currently, with the lack of managers in IT companies, there is a reliance on hiring someone who 'is the right fit' and are basically outsourcing the need to manage to the individual. If they don't work well, then there is something wrong with them.

In the context of a Capitalistic/Democracy, what role would managers play?
Well, in a Democracy, there is a need for Law-makers to make systems for people to interact in a helpful way to society. There is also a need for Courts for dispute resolution.

Managers should think of systems inside the company that promote honesty, tolerance and freedom of speech. Managers should also resolve disputes in the company and look for workplace complications before they become a full blown warring tribe. Bear in mind, that this framework encourages experts and experts usually have opinions.

Following the values of the framework, lets go over what a manager should do:

Competency: The manager should be competent enough at coming up with social systems that are effective for that specific company culture. The idea is that the cogs turn smoothly.
Simplification: The manager should set out rules in those systems, but set out very few rules and then enforce them. With regards to communication, less is more. The manager should make sure that a group can handle things in their own expertise and scope and try to reduce communication dependancies.
Transparency: The manager should implement metrics gathering to both know how the IT company is performing, but also be transparent to stakeholder outside IT and build trust with them.

6. Applying the Pattern

Let's take three measures of the output of a system to see how these philosophies could work: Speed, Control and Quality.

6.1 Speed

Competency: If we have experts, then we can make the best choices to build the products instead of trying out many choices until we reach the right one.
Simplification: If we simplify the system as much as we can, we can both integrate new systems faster as well as produce easy to use systems. In a lot of ways, simplifying equals business agility as it helps you change the business faster to meet the needs of the marketplace.
Transparency: If we have metrics that show us were bottleneck are in the system, we can make those systems as fast as possible.

6.2 Control

Competency: If we have a high degree competency for a defined scope and area, then we have a high degree of control over the system.
Simplification: If the system is simplified, it is easy to use it.
Transparency: If the movement of work is transparent, we can see monitor the time it takes to exchange communication and complete work in the system. Another way of looking at it is that one cog is moving slower and is slowing the system down. Ultimately, this is where a manager would need to step in.

6.3 Quality

Competency: If we have craftsmen, the cogs they produce are of high quality.
Simplification: If the products we deliver have been simplified, it provides an easy to use product for the customer (perceived quality).
Transparency: If we have metrics to see how popular the new product is and how it is used, we can improve the quality of that product. Ultimately, this will need direction from 'the business' and would require interaction with Technical Business Analysts (TBAs in the diagram).

Quality is never an accident. It is always the result of intelligent effort.

- John Ruskin

7. F.A.Qs

Is this system a replacement for Agile?

No, its completely complementary to it and would probably better serve the principle of having 'multi disciplinary teams'.

How do you prioritise or expedite work in this system?

That would be up to the manager. Technically, if you would like the option of expediting, you would need to leave some spare capacity in the teams.

What if there is not enough skill in house?

If you don't have the skills you need in the company, then consider bringing in an outside consultant - even if its for a few days. You will not gain new innovations, but you will gain from other company's experience.

What would happen there isn't enough work to justify a new field?

It could be very possible to let one person in the company have a dual-role and still have time to try and innovate in this new field.

How can I split up an area of expertise without it leading to a huge overhead of communication?

That would really depend on you and your needs. You need to find a balance of 'less is more' with regards to communication, but also have enough work concentrated in front of an expert for them to recognise patterns and generate innovation.

My MySQL Linux Tuning Checklist

2018-02-05T12:33:00.000+00:00

Things I look for when optimising or debugging a Linux OS:

IOschedular (noop or deadline)
Linux Kernel > 3.18 (multi queuing)
IRQbalance > 1.0.8
File System: noatime, nobarrier

ext4: data=ordered
xfs: 64k
logfiles in different partition (if possible)

Swapiness (0 or 1, depending)
Jemalloc (if needed)
Transparent hugepages - disabled
Ulimit (open files) >1000
Security

IPtables
PAM security

Raid Controller/Smart HBA

write-cache enabled
battery backed
For HP servers: hpssacli controller all show (detail|status)
Tweak cache-ratio to 50/50 or 75/25 (and test)

A DBA Analyses 'The Phoenix Project'

2018-01-04T15:05:00.000+00:00

Last year, I read 'The Phoenix Project'. I liked it and as an IT manager in the past, I did experience high blood pressure during the SEV1 scenarios in the book.

I also liked the way DevOps methodology helped solve issues with IT as well as help the company succeed overall.

As a DBA, however, I did have some things that didn't make sense to me about this story.

Bare in mind that the two major incidents in the book were database related. So in this post, I would like to jot down some things I have noticed and how they could have been solved looking at them from a different lens.

Caution, Spoiler Alert

Incident No.1 - Tokenisation

In the first incident, a 3rd party supplier ran a script against the database to tokenise some personal data. This was related to an issue that information security highlighted, but had the result of effecting HR and accounting.

In the book, there is a complaint that there was no test environment to see if this script would have any negative effects on other parts of the organisation.

Now to me, this does make sense and at the same time, makes no sense at all.

If you meant, that back in the day, it was hard to get full environments setup to test changes on your application servers, then you would probably be right. Today, perhaps based on the methodology that this book introduces, you probably do have those environments setup: either virtualised or in a container.

Testing Database

What doesn't make sense to me is that is not having a test database. Now reading through the book, there are mentions of Oracle database and some MS SQL databases. As a mainly MySQL DBA, I have not always worked on those databases, but I have worked next to people who have. My observation is, if you were to have an Oracle database, you would almost certainly have other dev/test/UAT/staging/pre-prod database servers as well.

Why do I think this? If you can afford to pay for an Oracle database, you would probably get more testing databases under the same license. License being the most expensive part when using Oracle.

So a testing database to test things that may effect the precious and expensive database server is almost a certainty.

DBA as a Gatekeeper

Now it seems shocking to me that the DBA had not been involved in the process to validate this 3rd party script. Old school Oracle DBAs are involved in everything that happens on their servers.

Need a user on the database? goto the DBA.

Need a database server for a new app? please fill these in triplicates, detailing what would be the projected usage for the next 5 years.

In most companies, an Oracle DBAs may even setup integration between other products like Oracle HR and finance.

So how could you have run something that significant against the database without their knowledge is beyond me.

Assuming that a database field had in fact been messed up, then Oracle DBAs have a TON of really enviable backup and restore features.

They can query a table to view all the backups that are available to restore from and choose the point-in-time that is closest to what they need. A DBA could simply restore the database, fetch the specific table that had its column changed and apply it to the to production database.

Its more than one table? Restore the database, go over the changes in the logs a point-in-time and skip the parts the conversion script applied.

It seems to me that the authors wrote the book based on their own experiences, but those experiences occurred in companies that had no DBAs. Not having a DBA is a product of start ups, not old school 1500-person car-parts manufacturers.

Incident No.2 - Conversion

There was a crippling database issue to do with a database conversion that was needed along side some new code roll out. The issue caused a 2 day - break out the hand held receipt machine - downtime to the system.

Works on My Laptop

During the initial investigation, a developer said something along the lines of 'it worked fine on my laptop' when describing the performance of the database conversion scripts. The problem was that on production, it was x1000 slower. Now, I have written about how to not be the one that deploys that slow query to production before and this really states that situation. Apparently, they still didn't have a database testing environment to test it against.

However, on the topic above of 'DBA as a gatekeeper':

Why didn't the DBA review the conversion scripts or was involved in the the code review process for SQL statements?

It could be that there wasn't any in the company.

Another point was that they couldn't cancel the conversion after they started and noticed how slow it was. If this was within a transaction or a single alter table statement, why not?

If too many things have changed, could they not restore the database to a point-in-time before the changes were made?

Was the conversion x1000 slow instead of maybe x10 slow, because of a foreign key check that could have been turned off?

A DBA would have given you those options.

Project Unicorn

After the hero turns things around and things begin to pickup, they decide to start a separate project to add predictive features to the main project. In it, they decided to bypass seeking permission for database changes and create a new database where they copied production data into it from several locations. I very much like this approach and it falls in line with the reactive micro services pattern.

This would make this book ahead of its time. Instead of managing one main database (although, they did mention in the book that had a couple of dozen database servers) for the website, they can break it up into several database servers, based on functionality. What is required is to use tools - and I would believe in 2012, they meant ETL tools - to migrate the needed data into these new database servers.

This would still need a DBA though or at the very least, a data engineer with an ops background, as you now need to:

Data model new environments based on data from old ones
Create and maintain data pipelines
Monitor for errors and fix data that didn't make it
Fix data drift and re-sync data across servers

In addition, you now need to backup, monitor the availability and performance of these additional database servers.

So while it adds complexity to the backend and you are now moving from simple database maintenance to a more data architecture role, it is the way forward. Certainly the only way to have proper micro services with their own single-purpose and loosely coupled data stores.

it might have been better if they just hired a DBA to solve thier DB issues.
— GuybrushThreepwoodⓋ (@jonathan_ukc) January 6, 2017

;)
— Kevin Behr (@kevinbehr) January 8, 2017

Top 4 Reasons Companies Won't Fix Their Database Issues

2018-01-03T10:00:00.000+00:00

When I consult at a company, I aim to identify issues with their database and give options on how to solve them.
However, sometimes implementing those solutions may be a more lengthy process than it needs to be and sometimes they may not be implemented at all. During my career, I have observed some reasons as to why that might happen within organizations.

Obviously, the following observations will never happen at your company. I am just writing about them so that you might notice them in other places.

1. Legacy code

People don't like to have anything to do with legacy code. It’s painful. It’s difficult. It’s risky to change. It runs business critical functions. Worse of all, they didn’t write it. This can be a problem as often, the most cripling database issues require changes to legacy code.

2. New Technologies or Methods

People don’t like you to introduce any new technologies they don’t want to learn and maintain. Not even different methods in technologies already being used. No fancy upgrades to the DB server, no new load balancers and certainly don’t start using SQL statements in the code over their existing ORM.

3. Old Technologies or Methods

In a complete polar opposite, people in tech organisations don’t like you to introduce boring technologies. What would be the point of introducing boring (yet tested) technologies when they could be playing around with shiny new ones. There is a caveat to this - groups prefer it when other groups they depend on (let’s say developers depend on ops) choose to use boring and tested technologies. Just not for themselves. And vice versa.

4. Management Involvement

Last, but certainly not least, no one from upper management will get involved in resolving these issues and push forward solutions. No project/product manager/agile-coach will be assigned to chase up issues. As far as they are concerned, this is an engineering issue and as engineers, you need to sort it out yourselves. Only 'change requests' from the business, have managers around it.

Final Thoughts

After some years of analysing database systems for performance issues, I am finally realising that I should also analyse human systems for performance issues.

Setting Up Databases in your Development Environment

2017-12-27T11:32:00.001+00:00

Setting up databases in development environments can be challenging.

Normally, what I usually see is some automated process for setting up empty databases with up-to-date data structures. This is helpful for integration testing, but is tricky for actual development as well as performance testing.

For example:

It is difficult to conceptually get your head around writing a query when you cannot see any data in your tables
You cannot possibly know if your query is slow before you deploying it to production without running it against 'some' data.

Relevant Post: How to Not be the One that Deploys that Slow Query to Production

In addition, there can be a strict requirement to not let sensitive customer data be available outside certain secure environments and certainly not available to development environments.

Step 1

What you would need to do is go over your database and separate the elements into different criteria:

Data Structure
User Management
Referential Tables
Primary Tables
Child Tables
Mapping Tables
Sensitive Data

(explanation below)

Data structure management and user management should be, by now, a solved problem. You have systems like Liquibase and Flyway that manage this for you. Essentially, you can use these systems to automatically generate containers which your developers can then use or setup empty local databases on developer machines using SQL scripts.

For user management, MySQL has PAM plugin to manage users via LDAP, but you can manage this through scripts as well.

Referential tables (tables that contain data such as id = 1, status = 'DONE') should also be small enough to be included in this stage as well. You need to identify which tables contain this very basic data and add it to the data structure repository or SQL file.

Step 2

This is where things get a little bit tricky: You need to identify which tables are your Primary 'feed data' tables or Object tables. You then need to identify which tables are the Child tables of those Primary tables. Lastly, you need to identify which tables Map keys across different tables - either Primary to Child (as in multi-to-multi relationships) or Primary to Primary.

Once you have identified these tables, you can discern how much data you would like to keep in your development databases. My recommendation would be to go in these three directions:

Specify a set number of keys in the Primary tables and then get the data from the Child and Mapping tables based on those keys.
Specify a specific set of keys from the Primary tables and then get the data from the Child and Mapping tables based on those keys.
Keep data by a date range for the primary table and then use its keys to populate the Child and Mapping tables.

Make sure that the amount of data is adequate for your needs: not too small and not too large.

Step 3

This separation of table types can now help us with identifying sensitive data. Data structure and also Referential tables, should not have in them sensitive data. Neither should Mapping tables. What would have sensitive data are Primary and Child tables.

Identify the columns where sensitive data maybe kept in those tables and either:

Empty that data
Give it a default value (all emails will be test@email.com)
Obfuscate those values in some way

You can change this the data by either outputting it with those changes into an SQL file or dumping that data into a staging database, changing the sensitive data and then dumping it into an SQL file with a tool.

Ideally, this stage needs to go through a QA process/person before the company releases sensitive data to generally available containers or repositories which keep history of changes.

Conclusion

With taking the time to separate the different elements in a database, you can make it less complicated and you would then be more able to automate parts of the database into your CI/CD process.

Data Modelling: Counter Table

2017-12-08T10:18:00.000+00:00

A counter table is a table that keeps counts of particular items or for certain keys. This can range from page count on your blog to keep track of a limit the user is allowed to have from a particular item or service.

Usually, a counter table would be better kept in something like Memcached or Redis as frequent increment updates would be better suited to those in-memory systems.

MySQL and Innodb in particular has many stringent systems to make sure that your data has been reliably written to disk. Just going through those systems alone, can make having a counter table, not suitable, not even considering the speed it takes to update the actual table.

However, sometimes there is a need for certain assurances from failure scenarios where in-memory systems may not be suitable for - as when they crash, the data kept in memory is cleared out.

In those cases, may I recommend that you do what I consider a 'aggregate counter table'. The idea here is to replace doing lots of increment updates and simply count the original base table you are interested in having counts for.

In short, instead of:

INSERT INTO base_table;
UPDATE counter_table set value=value+1 where key=key1;

You would do

INSERT INTO base_table;
On interval (like 1 to 5 seconds):
- INSERT INTO counter_table 
- SELECT key1, count(1), max(primarykey) FROM base_table 
- WHERE last_count_position
- GROUP BY key1
- ON DUPLICATE KEY UPDATE value=value+recent_count

In order to be able to aggregate the base_table more correctly, you need to keep some sort of record of what was the last time or position you read for the base table. What I recommend you consider, is either the primary key, assuming its an integer as well as having a last_updated timestamp column.

Below is an example of a counter table that keeps the last id of the primary key it counted from the base table:

CREATE TABLE counter_table (
  key_id int(10) unsigned NOT NULL,
  counts int(10) unsigned DEFAULT '0',
  lastprimary_id int(10) unsigned DEFAULT '0',
  PRIMARY KEY (key_id),
  KEY idx_camp (lastprimary_id)
) ENGINE=InnoDB;

In order to run your 'refresh' query, you would first need to query the counter_table like this:

SELECT max(lastprimary_id) from counter_table;

Then populate the counter table by including in your above INSERT INTO SELECT statement a:

WHERE base_table.primarykey > lastprimary_id

This should be very fast and will prevent the many 'database-attacking update queries' that can become a serious bottleneck to your performance in the long run.

Downsides

This method doesn't factor in if the rows in the base table were UPDATE'd or DELETE'd. It just counts the row number. If this is a requirement, you can revert to using UPDATE statements for:

UPDATE counter_table SET value=value-1

with the understanding that this will happen infrequently.

You also, now need to maintain a procedure and monitor that it is running on the set intervals that you need it. Fortunately, MySQL has scheduled Events to help with that.

Archiving for a Leaner Database

2017-11-30T22:15:00.003+00:00

There is an idea that data is sacred and needs to be stored forever. However, if you keep your data forever, you will, sooner or later, have a very large database.

In order to keep operations running smoothly, it would be useful to allocated data that is used more frequently in certain tables and keep data that is used less frequently in archive tables.

Some examples

You have a large table that stores transactions and it's size is 200Gb. It is that way, because your company has been around for 5 years, but in the last year, your company has been successful acquiring new users and your data has doubled.

Congratulations.

In your database, you now have a table that has 5 years worth of data, but your application usually only needs about the last 1-3 months. There may be a use case where someone might require data about a customer for a period starting a year ago and there may also be a reporting request to aggregate data for the last 3 years. Therefore, to play it safe, we need everything in one table.

However, this greatly effects performance. It would be more helpful to try and separate those 3 concerns into 3 different tables:

A table for a 3 month period for frequently used data
An archive table that keeps all old and infrequently used data
A summary table for reporting

With these, we are complying with the principle of Single-Responsibility and greatly improve performance for each purpose.

Having a 'main' table with only the last 3 months worth of data, greatly allows you to scale.
For example, even if your data doubles every year for the next 3-5 years, you still only have to manage a subset of that data. So if those 3 months once took a table 20Gb to store, the year following would be 40Gb and the year after would be 80Gb: These sizes are still very manageable by todays standards.
In addition, hardware and software improves over time, so there can be a legitimate expectation that simply by upgrading and updating, you can keep humming along.

Taking the effort to identify 'hot' and 'cold' data and allocating it to the right tables, can mean that your scalability concerns will be addressed for the long term.

How to implement Archiving?

Table _archive

One way to implement archiving, is by having a table that ends with _archive.

To enable this, you will need to be able to redirect your queries (from your code mainly, or by a proxy that can do that) to the main or the archive table, based on a particular criteria.

For example, if the date is less than today's date minus 30 days, then send it to the archive table, if not, then the main table.

Another example may be, if the status column equals 'inactive' send to the archive table.

You would largely need to dig through your code for that table and wrap it with an IF statement to send to the right.

You would also need a data process that migrates data from the main table over to the archive table when it gets old or becomes cold.

Partitioning by Date

While this is not a different physical data model, this does help split the table into a few tables and achieving the desired purpose without application code changes.

Is it very common to partition your table to specify which data may be old and allocate it in the right partition, based on date.

mysql> CREATE TABLE `largetable` (
->   `id` bigint unsigned NOT NULL AUTO_INCREMENT,
->   `dateCreated` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP,
->   `status` int default 1,
->   `sometext` text,
->   PRIMARY KEY (`id`,`dateCreated`)
-> ) ENGINE=InnoDB DEFAULT CHARSET=utf8;
Query OK, 0 rows affected (0.03 sec)

mysql> alter table largetable partition by RANGE(YEAR(dateCreated)) (
-> PARTITION p2016 VALUES LESS THAN (2017), 
-> PARTITION p2017 VALUES LESS THAN (2018), 
-> PARTITION p2018 VALUES LESS THAN (2019), 
-> PARTITION p2019 VALUES LESS THAN (2020), 
-> PARTITION p2020 VALUES LESS THAN (2021), 
-> PARTITION pmax VALUES LESS THAN MAXVALUE);
Query OK, 0 rows affected (0.05 sec)
Records: 0  Duplicates: 0  Warnings: 0

The above example, allocates data by which year the row was created. Please note, after 2020, this sort of manual partitioning will require manually adding new years to this table. If you do it in advance, this can be done without disrupting operations.

Partitioning by Status

You can also have a partition (as mentioned above) to a status column to active/inactive and simply by using UPDATE to change the value MySQL will move over that row to the right partition. REPLACE or INSERT + DELETE will work as well.


mysql> CREATE TABLE `largetable` (
->   `id` bigint unsigned NOT NULL AUTO_INCREMENT,
->   `dateCreated` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP,
->   `status` int default 1, -- default active
->   `sometext` text,
->   PRIMARY KEY (`id`,`status`)
-> ) ENGINE=InnoDB DEFAULT CHARSET=utf8;
Query OK, 0 rows affected (0.02 sec)

mysql> alter table largetable partition by list(status) (
-> partition pactive values in (1), -- active 
-> partition pinactive values in (2) -- inactive
-> ); 
Query OK, 0 rows affected (0.03 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql> select * from largetable partition (pactive);
Empty set (0.00 sec)

mysql> select * from largetable partition (pinactive);
Empty set (0.00 sec)

mysql> insert into largetable(sometext) values ('hello');
Query OK, 1 row affected (0.01 sec)

mysql> select * from largetable partition (pinactive);
Empty set (0.00 sec)

mysql> select * from largetable partition (pactive);
+----+---------------------+--------+----------+
| id | dateCreated         | status | sometext |
+----+---------------------+--------+----------+
|  1 | 2017-10-30 10:04:03 |      1 | hello    |
+----+---------------------+--------+----------+
1 row in set (0.00 sec)

mysql> update largetable set status = 2 where id =1 ;
Query OK, 1 row affected (0.00 sec)
Rows matched: 1  Changed: 1  Warnings: 0

mysql> select * from largetable partition (pactive);
Empty set (0.00 sec)

mysql> select * from largetable partition (pinactive);
+----+---------------------+--------+----------+
| id | dateCreated         | status | sometext |
+----+---------------------+--------+----------+
|  1 | 2017-10-30 10:04:03 |      2 | hello    |
+----+---------------------+--------+----------+
1 row in set (0.00 sec)

Partitioning by ID

And lastly, you can partition on the sequence of your auto incrementing id key.


mysql> CREATE TABLE `largetable` (
->   `id` bigint unsigned NOT NULL AUTO_INCREMENT,
->   `dateCreated` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP,
->   `status` int default 1,
->   `sometext` text,
->   PRIMARY KEY (`id`)
-> ) ENGINE=InnoDB DEFAULT CHARSET=utf8;
Query OK, 0 rows affected (0.02 sec)

mysql> alter table largetable partition by RANGE(id) (
-> PARTITION p1 VALUES LESS THAN (500000000), 
-> PARTITION p2 VALUES LESS THAN (1000000000), 
-> PARTITION p3 VALUES LESS THAN (1500000000), 
-> PARTITION p4 VALUES LESS THAN (2000000000), 
-> PARTITION p5 VALUES LESS THAN (2500000000), 
-> PARTITION pmax VALUES LESS THAN MAXVALUE);
Query OK, 0 rows affected (0.06 sec)
Records: 0  Duplicates: 0  Warnings: 0

The above example specifies which partition the row should goto based on the range of what the id number is. This example is more useful if your system does a lot of primary key look ups. It also helps with distributing the table sizes more equally when compared to dates, as you can have more data in recent years.

A word of caution

Partitioning on the right key is absolutely crucial and not easy. You need to analyse the queries that the application sends to that specific table and come up with a partitioning key(s) that works well and does not slow down the table - at least not the top 80% of the slowest queries.

The partitioning key would need to go into the PRIMARY KEY and in order for the optimiser to send you to the right partition, that key would ideally be included in the WHERE clause of all SELECT/UPDATE/DELETE queries. Otherwise, your query would run sequentially through each partition in that table.

How to Not be the One that Deploys that Slow Query to Production

2017-11-25T15:58:00.001+00:00

Have you ever deployed code to production that slowed the database and the entire site down?
Did you get a feeling of anxiety when you just read that?

Well, it could be to do with your test environment.

Most places nowadays have test suites that will check for integration issues. But in very few will check for performance.

The issue lies with how developers are set up to develop code. They have their code on their local computer with a local copy of an empty database where they develop against. That setup will not give you any useful feedback about how your code will performs once its run against the production database.

How do you get Performance Feedback for your Queries?

Whenever you log into your database, lets say MySQL, you get 3 types of feedback:

Your result set
The number of rows
The time it took the query to run

(Postgres, by the way, has \timing.)

In order to get the right time for your query, you need to run it in on a database that is similar in hardware, parameters and more importantly, database size as your production database.

Here is an example: if you take a SELECT query that you wrote that has 3 JOINs and you run it on a read-only slave DB server. You choose some decent sample variables from the existing data and you get a result of 0.3 seconds. Now, barring exceptions such as deadlocking, conflicts or server wide slowdown, there is a very high chance that that query will take 0.3 seconds when its run against the production database.

Once you have an environment to test against, you can run EXPLAIN on your query and make improvements till you are happy with it.

But what do I do if I use an ORM?

Well, if you can output the query that the ORM will use and run that against a database, you will know how long it takes. Hopefully, you will be able to make improvements to your query through the ORM.

Conclusion

Make sure that in your company, you have a database to test against that is similar to production. If that is not available, see if you can get access to a read-only DB (sometimes there is one for back up purposes) and at least test your SELECT queries against it.

You can then relatively confidently, deploy it to live. At the very least, with a lot less stress.

Top 5 Ways to Overcome Database Skill Shortages

2017-11-14T14:40:00.002+00:00

In every organisation and in particular new ones, there seems to be a lack of experience and knowledge around databases.

Our experience shows that there is a huge shortage in skills around managing databases, database performance engineering, developing scalable backend database interactions and designing physical data modelling for performance.

Organisations will typically spend huge amounts of money and time to circumvent these shortages until they become simply too expensive to ignore.

StackOverFlow Developer Hiring Trends 2017

Do your users complain that your system is slow and your developers seem to deploy software releases less and less frequently?

These symptoms could be a result of your company databases becoming more difficult to manage and more cumbersome to work with, making the company spin its wheels while competitors gain ground.

Here are some suggestions to help you overcome lack of skills in this area:

Adopt database management best practice. Industry best practices are not easy to come by. The “not invented here” approach can lead to ignoring best practices entirely. But databases have been around for decades, so it’s likely, that industry best practices exist that can solve most problems, offering your business the best and quickest route from where you are now to where you need to get to.

Instil a “look under the hood” culture. Nowadays, so much is hidden away from us. In most cases, we prefer this as we have too many other day-to-day problems to solve. However, learning how databases work under the hood can provide the skills to troubleshoot when things go wrong.

Find the ‘Top 3’ reasons that are holding your system back. People need to keep in mind that nowadays they almost always work with complex systems. Such systems rarely have just one root cause for any problem. It would be better to focus on the top 3 root causes that may cause severe performance issues and which cannot be explained when looking for a single root cause.

Database performance monitoring that makes sense. Monitoring that doesn’t give you the information you need to help maintain the system, is basically noise. You need a combination of metrics and logs to identify bottlenecks and determine changes that will result in faster database performance in order to get an understanding of how the system is managing under load when your application uses it.

If you can’t find the answer, seek help. There are experts available to help you with your specific database issues. It would be better to consult with one, rather than look to other products which may be more expensive down the line to move to and maintain. Installing a different product, learning how to use it, discovering it’s quirks and how much work is involved to move to it, will be more expensive and time consuming than bringing in an expert, who can advise on the original problem at a relatively small fixed cost.

Top Slowest Queries and their Associated Tables in MySQL

2017-10-17T11:53:00.000+01:00

The following query gets data from performance_schema in MySQL and attempts to regex the digest to the list of tables in the same schema.

 SELECT d.*,  
  (SELECT group_concat(distinct TABLE_NAME) FROM information_schema.TABLES   
 WHERE table_schema = d.schema_name and d.digest_text regexp table_name) table_name  
  FROM performance_schema.events_statements_summary_by_digest d  
 WHERE d.DIGEST_TEXT regexp "^(SELECT|UPDATE|DELETE|REPLACE|INSERT|CREATE)"  
 and d.LAST_SEEN >= curdate() - interval 7 day  
 ORDER BY d.SUM_TIMER_WAIT DESC limit 10\G

Top 3 Reasons Why SQL is Faster than Java

2017-10-16T16:16:00.002+01:00

I had a discussion with a colleague the other day. He was trying to write some SQL to use for a less-than-optimal data structure and was getting frustrated that it was looking "cumbersome". He wanted some advice, but was keen to simply write it with a mix of a few light SQL statements and some Java.

I would like to explain why this option would be slower than using "ugly looking" and "cumbersome" SQL:

1) Disk I/O

If you were to use Java, you would need to probably get a larger dataset from the database, process it in some way and output the results. This would mean that the database would need to fetch that larger dataset for you which would mean more (sometimes much more IO)

If you were to use SQL, you are leaving the fetching operation to the database's optimiser and with the help of indexes - may not fetch as much as with Java.

In short, you are allowing the database to reach the right data and filter what not to fetch - for you.

2) Network

For the reason above, the large dataset normally has to travel over a network. This is unless the java app server is located on the same machine as the database. This is not very common nowadays.

That network overheard can become more pronounced in a virtual or containerised environment where network issues can be a headache. (*note: I am not an expert, just observing from a distance)

In addition, needing to pass data through a network can be an additional overhead in parallel systems where the data needs to travel to each machine before starting an operation.

3) Java's Garbage Collection

It may not be known to most people, but Java adds quite the memory overhead for objects and some data structures. You can sometimes get a x100 difference. This does not mean that that you need x100 available memory, but it would mean that the GC would work extra hard with more CPU cycles to clean up the extra memory churn.

Bonus: SQL takes far less code than it would do in Java

While SQL can be an ugly string in your code sometimes, doing it in Java can take between x30-x100 more lines of code including tests. You may also need to test that your code does what SQL already does well such as JOINs and aggregate functions.

Caveat: When it is a good idea to use Java over SQL?

For processing a lot of data, Databases have the following concept:
Row vs Chunk vs Too Big

'Row' would be the slowest way of getting and processing data - unless you need to guarantee some level of data quality which requires it.
'Chunk' or a set of rows, is just right. Usually this would mean querying a large table by using 2 or 3 keys and get a result set that the database can handle well.
'Too Big' is a case where the database cannot handle well the number of rows and you would need to split your SQL into 'Chunks' using Java and process it that way.

Please also check out this book that I found useful in this matter:

What is a Good Data Model

2017-10-06T09:21:00.000+01:00

This is an excerpt from something I am working on about physical data modelling.

A good data modelled table should

Be able to retrieve data quickly
Be able to store data quickly
Be clear and easy to work with

A good data modelled table should not

Store unneeded data
Need to change its rows very often **
Need too many JOINs to get you the data that you need

Purposes of a Data Model

A good data model should serve a specific and narrow set of purposes.

The more purposes the table serves the:

More indexes it would need.
More cumbersome it will be to store and keep in memory.
More overhead it would be to write to.
More likely it be a single-point-of-failure
More likely it would have locks and deadlocks
More likely it would be to add unneeded data
More difficult it would be to make changes to your application if you needed to make changes to the table.

If you notice a pattern here, you may notice that reusability to a high degree, may hinder the performance of a database. There needs to be a balance between reusability and single-responsibility of the data models to be effective.

** A table that has data that needs to change often and is transient, may be better suited in a cache. If it needs to be saved and transactional, then a smaller table that records the state of certain keys or values with a combination of a log to store how it got that way if it is needed.

MariaDB's Columnar Store

2016-12-30T12:52:00.000+00:00

I have been keeping an eye on MariaDB's Columnar store progress for a bit longer then half a year.

MariaDB chose to take the infinidb code after Calpoint closed shop about two years ago and implemented it into their product. I was a bit wary about infinidb as well as it was a columnar store without compression that had mixed reviews on news hacker.

However, it seems like MariaDB have pulled it off. They have added the infinidb engine to MariaDB with all its quirks and extra commands and they have added snappy compression as well. This is truely a huge win for them and their users, specifically in the area of reporting and analytics.

Here are two real life examples for getting data ready for reporting currently happening in the wild:
1) MySQL -> Sqoop -> Hadoop - where you would need a) 5-6 additional servers, b) someone to set those servers up in a hadoop cluster and then c) monitor the daily data transfer.
2) MySQL -> CDC -> Kafka -> Hadoop - a) more technologies to master, b) a few more servers and some c) more monitoring. But this time, its streaming.

To set all of this up could take from a couple of months to a year.

Now with MariaDB, you have:
1) MariaDB + some SQL scripts - such as INSERT INTO datawarehouse.facttable SELECT ... FROM site.table1 WHERE date >= curdate() - interval 1 day;
2) MariaDB -> Pentaho/Talend -> MariaDB - Could be a bit slower, but with a GUI and really a lot of monitoring out of the box.

As you can see, there are a lot fewer technologies, a lot fewer complexities and it is a lot more straight forward to develop.

It is also very important to add that no one other than MariaDB is doing this. The closest you have is Tokudb which is great and can also last you a while, but a Columnar store is definitely more suited for this type of task.

So once again, congratulations to MariaDB for offering such a huge benefit to its users.

Using a Generated Column to help with date lookups

2016-12-22T12:53:00.000+00:00

I have a table that has two columns: year and month.
While its ok to search on exact dates or by year, it is harder to search between two dates.

Lets see if we can solve this issue by using a generated column.
(Table taken from https://github.com/jonathanvx/uk_land_registry_paid_dataset)

JSON and MySQL Stored Procedures

2016-12-21T15:50:00.000+00:00

You probably heard that MySQL 5.7 supports JSON.
But did you know that you can also use JSON with MySQL Stored Procedures - making them very flexible?

Less talk, more code:
(Data was used from the UK Land Registry that I worked on)

Seismic Shock in the Analytics World

2016-06-15T21:47:00.000+01:00

Yandex have released a free columnar store analytical database called clickhouse.
It seems to be using MySQL from what I can tell (correct me if I am wrong), but its obviously their storage engine and they added some math and aggregation functions.

If it is anything like Infobright then you can expect x50 compression, so for most of us mere mortals that means you can keep everything on one database.

It has (really interesting) replication with zookeeper.

It has materialised views which is absolutely huge.

The benchmarks are here.

It is a lot faster than hadoop/hive. Not sure about the Cloudera/Impala or Spark, but this would potentially be a lot simpler to administrator and is free, of course.

It also comes with tons of aggregation functions, geospatial functions, math functions... a treat for any data scientist.

Watch this space.