TwinDB minimalist

Recovery After DROP TABLE, With innodb_file_per_table ON

aleks — Sat, 06 Mar 2021 17:57:36 +0000

Introduction

In the previous post, we described a situation when the UnDrop For InnoDB toolkit can be used to recover an accidentally dropped table with innodb_file_per_table=OFF.
In this post, we’ll show how to recover MySQL tables or databases if innodb_file_per_table is ON. This option tells InnoDB to store each table with a user in a separate data file.

For the recovery test, we’ll use the same sakila database that we used in the previous post.

root@test:/var/lib/mysql/sakila# ls -la
total 23468
drwx------ 2 mysql mysql     4096 Jul 15 04:26 ./
drwx------ 6 mysql mysql     4096 Jul 15 04:26 ../
-rw-rw---- 1 mysql mysql     8694 Jul 15 04:26 actor.frm
-rw-rw---- 1 mysql mysql   114688 Jul 15 04:26 actor.ibd
-rw-rw---- 1 mysql mysql     2871 Jul 15 04:26 actor_info.frm
-rw-rw---- 1 mysql mysql     8840 Jul 15 04:26 address.frm
-rw-rw---- 1 mysql mysql   163840 Jul 15 04:26 address.ibd
-rw-rw---- 1 mysql mysql     8648 Jul 15 04:26 category.frm
-rw-rw---- 1 mysql mysql    98304 Jul 15 04:26 category.ibd
-rw-rw---- 1 mysql mysql     8682 Jul 15 04:26 city.frm
-rw-rw---- 1 mysql mysql   114688 Jul 15 04:26 city.ibd
-rw-rw---- 1 mysql mysql     8652 Jul 15 04:26 country.frm
-rw-rw---- 1 mysql mysql    98304 Jul 15 04:26 country.ibd
...
-rw-rw---- 1 mysql mysql       36 Jul 15 04:26 upd_film.TRN
root@test:/var/lib/mysql/sakila#

Note the two files related to the table country: country.frm, country.ibd.
We will drop this table and try to recover it. First, we take the checksum and preview the records this table contains:

mysql> SELECT * FROM country LIMIT 10;
+------------+----------------+---------------------+
| country_id | country        | last_update         |
+------------+----------------+---------------------+
|          1 | Afghanistan    | 2006-02-15 04:44:00 |
|          2 | Algeria        | 2006-02-15 04:44:00 |
|          3 | American Samoa | 2006-02-15 04:44:00 |
|          4 | Angola         | 2006-02-15 04:44:00 |
|          5 | Anguilla       | 2006-02-15 04:44:00 |
|          6 | Argentina      | 2006-02-15 04:44:00 |
|          7 | Armenia        | 2006-02-15 04:44:00 |
|          8 | Australia      | 2006-02-15 04:44:00 |
|          9 | Austria        | 2006-02-15 04:44:00 |
|         10 | Azerbaijan     | 2006-02-15 04:44:00 |
+------------+----------------+---------------------+
10 rows in set (0.00 sec)
 
mysql> CHECKSUM TABLE country;
+----------------+------------+
| Table          | Checksum   |
+----------------+------------+
| sakila.country | 3658016321 |
+----------------+------------+
1 row in set (0.00 sec)
 
mysql> SELECT COUNT(*) FROM country;
+----------+
| COUNT(*) |
+----------+
|      109 |
+----------+
1 row in set (0.00 sec)

Accidental Drop

Now, we drop the table and look for files related to the table. As you can see from the list, files with country table data are no longer there:

mysql> SET foreign_key_checks=OFF;
Query OK, 0 rows affected (0.00 sec)
 
mysql> DROP TABLE country;
Query OK, 0 rows affected (0.00 sec)
 
root@test:~# cd /var/lib/mysql/sakila/
root@test:/var/lib/mysql/sakila# ls -la
total 23360
drwx------ 2 mysql mysql     4096 Jul 15 04:33 ./
drwx------ 6 mysql mysql     4096 Jul 15 04:26 ../
-rw-rw---- 1 mysql mysql     8694 Jul 15 04:26 actor.frm
-rw-rw---- 1 mysql mysql   114688 Jul 15 04:26 actor.ibd
-rw-rw---- 1 mysql mysql     2871 Jul 15 04:26 actor_info.frm
-rw-rw---- 1 mysql mysql     8840 Jul 15 04:26 address.frm
-rw-rw---- 1 mysql mysql   163840 Jul 15 04:26 address.ibd
-rw-rw---- 1 mysql mysql     8648 Jul 15 04:26 category.frm
-rw-rw---- 1 mysql mysql    98304 Jul 15 04:26 category.ibd
-rw-rw---- 1 mysql mysql     8682 Jul 15 04:26 city.frm
-rw-rw---- 1 mysql mysql   114688 Jul 15 04:26 city.ibd
-rw-rw---- 1 mysql mysql       40 Jul 15 04:26 customer_create_date.TRN
-rw-rw---- 1 mysql mysql     8890 Jul 15 04:26 customer.frm
-rw-rw---- 1 mysql mysql   196608 Jul 15 04:26 customer.ibd
-rw-rw---- 1 mysql mysql     1900 Jul 15 04:26 customer_list.frm
-rw-rw---- 1 mysql mysql      297 Jul 15 04:26 customer.TRG
-rw-rw---- 1 mysql mysql       65 Jul 15 04:26 db.opt
...
-rw-rw---- 1 mysql mysql       36 Jul 15 04:26 upd_film.TRN
root@test:/var/lib/mysql/sakila#

Recovery After DROP TABLE

This situation is a little bit more complex, since we need to recover a deleted file. If the database server has active communication with HDD, it’s possible that the deleted file will be rewritten with other data. Therefore, it’s critical to stop the server and to mount the partition read-only. But for the test we will just stop mysql service and continue with the recovery.

root@test:/var/lib/mysql/sakila# service mysql stop
mysql stop/waiting

Despite the fact that user data is stored in separate files per each table, data dictionary is still stored in the ibdata1 file. That’s why we need to use stream_parser for /var/lib/mysql/ibdata1.

Check out a video with the toolkit overview and usage instructions.

In order to find table_id and index_id for the table country, we will use the dictionary stored in SYS_TABLES and SYS_INDEXES. We will fetch the data from the ibdata1 file. The dictionary records are always in REDUNDANT format, therefore we specify option -4. We assume that the MySQL server has flushed changes to the disk, so we add the -D option which means “find deleted records”. SYS_TABLES information is stored in the file with index_id=1 which is file pages-ibdata1/FIL_PAGE_INDEX./0000000000000001.page:

root@test:~/undrop-for-innodb# ./c_parser -4D \
    -f ./pages-ibdata1/FIL_PAGE_INDEX/0000000000000001.page \
    -t ./dictionary/SYS_TABLES.sql \
    | grep country
000000000CDC  62000001960684  SYS_TABLES      "sakila/country"        228     3       1       0       0       ""      88
000000000CDC  62000001960684  SYS_TABLES      "sakila/country"        228     3       1       0       0       ""      88

SET FOREIGN_KEY_CHECKS=0;
LOAD DATA LOCAL INFILE '/home/asterix/undrop-for-innodb/dumps/default/SYS_TABLES' REPLACE INTO TABLE `SYS_TABLES` FIELDS TERMINATED BY '\t' OPTIONALLY ENCLOSED BY '"' LINES STARTING BY 'SYS_TABLES\t' (`NAME`, `ID`, `N_COLS`, `TYPE`, `MIX_ID`, `MIX_LEN`, `CLUSTER_NAME`, `SPACE`);
 
root@test:~/undrop-for-innodb#

We can see that the country table has table_id=228. The next step would be to find the PRIMARY index of the table country. For this purpose, we take the records of the SYS_INDEXES table from the file 0000000000000003.page (SYS_INDEXES table contains mapping between table_id and index_id). The structure of SYS_INDEXES is specified with a -t option.

root@test:~/undrop-for-innodb# ./c_parser -4D
    -f ./pages-ibdata1/FIL_PAGE_INDEX/0000000000000003.page \
    -t ./dictionary/SYS_INDEXES.sql \
    | grep 228
000000000CDC    620000019605A8  SYS_INDEXES     228     547     "PRIMARY"       1       3       88      4294967295
000000000CDC    620000019605A8  SYS_INDEXES     228     547     "PRIMARY"       1       3       88      4294967295

SET FOREIGN_KEY_CHECKS=0;
LOAD DATA LOCAL INFILE '/home/asterix/undrop-for-innodb/dumps/default/SYS_INDEXES' REPLACE INTO TABLE `SYS_INDEXES` FIELDS TERMINATED BY '\t' OPTIONALLY ENCLOSED BY '"' LINES STARTING BY 'SYS_INDEXES\t' (`TABLE_ID`, `ID`, `NAME`, `N_FIELDS`, `TYPE`, `SPACE`, `PAGE_NO`);
 
root@test:~/undrop-for-innodb#

Almost There!

We can see that index_id of the dropped table country is 547. The following step is different from the step we took in the case with innodb_file_per_table=OFF. Since there is no file with data available, we’ll scan through the whole partition as a raw device and look for data that fits the expected structure of the database pages. By the way, you can use this approach with corrupted data files. If some data is corrupted, the recovery tool can perform partial data recovery. In the options of the tool we specify device name and device size (can be approximate).

root@test:~/undrop-for-innodb#./stream_parser \
    -f /dev/vda \
    -t 20000000k
Opening file: /dev/vda
File information:
 
ID of device containing file:            5
inode number:                         6411
protection:                          60660 (block device)
number of hard links:                    1
user ID of owner:                        0
group ID of owner:                       6
device ID (if special file):         64768
blocksize for filesystem I/O:         4096
number of blocks allocated:              0
time of last access:            1405411377 Tue Jul 15 04:02:57 2014
time of last modification:      1404625158 Sun Jul  6 01:39:18 2014
time of last status change:     1404625158 Sun Jul  6 01:39:18 2014
total size, in bytes:                    0 (0.000 exp(+0))
 
Size to process:               20480000000 (19.073 GiB)
Worker(0): 1.06% done. 2014-07-15 04:57:37 ETA(in 00:01:36). Processing speed: 199.848 MiB/sec
Worker(0): 2.09% done. 2014-07-15 04:57:37 ETA(in 00:01:35). Processing speed: 199.610 MiB/sec
Worker(0): 3.11% done. 2014-07-15 04:59:13 ETA(in 00:03:09). Processing speed: 99.805 MiB/sec
...
Worker(0): 97.33% done. 2014-07-15 04:57:15 ETA(in 00:00:05). Processing speed: 99.828 MiB/sec
Worker(0): 98.35% done. 2014-07-15 04:57:20 ETA(in 00:00:06). Processing speed: 49.941 MiB/sec
Worker(0): 99.38% done. 2014-07-15 04:57:17 ETA(in 00:00:01). Processing speed: 99.961 MiB/sec
All workers finished in 77 sec
root@test:~/undrop-for-innodb#

Stream parser stores the resulted files with pages to the folder pages-vda (name derived from the file given to stream_parser). We can see that the necessary index is present in the files.

root@test:~/undrop-for-innodb/pages-vda/FIL_PAGE_INDEX# ls -la | grep 547
-rw-r--r-- 1 root root    32768 Jul 15 04:57 0000000000000547.page
root@test:~/undrop-for-innodb/pages-vda/FIL_PAGE_INDEX#

We will look for the data in the file 0000000000000547.page. The c_parser tool fetches and prints records from the pages file in a TAB-Separated-Values format.

root@test:~/undrop-for-innodb# ./c_parser -6 \
    -f pages-vda/FIL_PAGE_INDEX/0000000000000547.page \
    -t sakila/country.sql \
    | head -5
-- Page id: 3, Format: COMPACT, Records list: Valid, Expected records: (109 109)
000000000C4B    F30000038C0110  country 1       "Afghanistan"   "2006-02-15 04:44:00"
000000000C4B    F30000038C011B  country 2       "Algeria"       "2006-02-15 04:44:00"
000000000C4B    F30000038C0126  country 3       "American Samoa"        "2006-02-15 04:44:00"
000000000C4B    F30000038C0131  country 4       "Angola"        "2006-02-15 04:44:00"
root@test:~/undrop-for-innodb#

The result looks valid, so we prepare files for loading data back to the database. The LOAD DATA INFILE command with necessary options is sent to the STDERR.

root@test:~/undrop-for-innodb# ./c_parser -6 \
    -f pages-vda/FIL_PAGE_INDEX/0000000000000547.page \
    -t sakila/country.sql \
    > dumps/default/country \
    2> dumps/default/country_load.sql

Loading Data Back To The Database

Now, we’re going to load data back to the database. Before loading the data, we create an empty structure of the table country:

root@test:~/undrop-for-innodb# service mysql start

root@test:~/undrop-for-innodb# mysql -uroot -p
 
mysql> use sakila;
mysql> source sakila/country.sql
Query OK, 0 rows affected (0.00 sec)
...
 
Query OK, 0 rows affected (0.00 sec)
 
mysql> show create table country\G
*************************** 1. row ***************************
       Table: country
Create Table: CREATE TABLE `country` (
  `country_id` smallint(5) unsigned NOT NULL AUTO_INCREMENT,
  `country` varchar(50) NOT NULL,
  `last_update` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  PRIMARY KEY (`country_id`)
) ENGINE=InnoDB AUTO_INCREMENT=110 DEFAULT CHARSET=utf8
1 row in set (0.00 sec)

And now we are loading the data itself.

root@testB:~/undrop-for-innodb# mysql --local-infile -uroot -p
mysql> USE sakila;
mysql> source dumps/default/country_load.sql
Query OK, 0 rows affected (0.00 sec)
 
Query OK, 327 rows affected (0.00 sec)
Records: 218  Deleted: 109  Skipped: 0  Warnings: 0

Checking Data Quality

So, the last thing that remaining is to check the quality of recovered data. We’ll preview several records, calculate the total number of records and checksum.

mysql> SELECT COUNT(*) FROM country;
+----------+
| COUNT(*) |
+----------+
|      109 |
+----------+
1 row in set (0.00 sec)
 
mysql> SELECT * FROM country LIMIT 5;
+------------+----------------+---------------------+
| country_id | country        | last_update         |
+------------+----------------+---------------------+
|          1 | Afghanistan    | 2006-02-15 04:44:00 |
|          2 | Algeria        | 2006-02-15 04:44:00 |
|          3 | American Samoa | 2006-02-15 04:44:00 |
|          4 | Angola         | 2006-02-15 04:44:00 |
|          5 | Anguilla       | 2006-02-15 04:44:00 |
+------------+----------------+---------------------+
5 rows in set (0.00 sec)
 
mysql> CHECKSUM TABLE country;
+----------------+------------+
| Table          | Checksum   |
+----------------+------------+
| sakila.country | 3658016321 |
+----------------+------------+
1 row in set (0.00 sec)

So, we’re in luck. Despite the fact that we used the system volume (which is not the recommended practice) for mysql data, and that we haven’t re-mounted partition as read-only (and other processes were continuing to perform writing to the disk), we managed to recover all the records. Calculated checksum after the recovery (3658016321) is equal to the checksum taken before the drop (3658016321).

Recovery After DROP TABLE, With innodb_file_per_table OFF

aleks — Sat, 06 Mar 2021 16:51:30 +0000

Author Andriy Lysyuk.

Introduction

Unfortunately, human mistakes are inevitable. That’s how life is. Wrong DROP DATABASE or DROP TABLE may destroy critical data on the MySQL server. Obviously, backups would help, however they’re not always available. This situation is frightening but not hopeless. In many cases it’s possible to recover almost all the data that was in the database or table.
Let’s look at how we can do it. The recovery plan depends on whether InnoDB kept all data in a single ibdata1 or each table had its own tablespace. In this post we will consider the case when innodb_file_per_table=OFF. This option assumes that all tables are stored in a common file, usually located at /var/lib/mysql/ibdata1.

One Wrong Move, And The Table’s Gone

For our scenario, we use the sakila test database that comes together with the tool.
Supposedly, we drop my mistake table actor:

mysql> SELECT * FROM actor LIMIT 10;
+----------+------------+--------------+---------------------+
| actor_id | first_name | last_name    | last_update         |
+----------+------------+--------------+---------------------+
|        1 | PENELOPE   | GUINESS      | 2006-02-15 04:34:33 |
|        2 | NICK       | WAHLBERG     | 2006-02-15 04:34:33 |
|        3 | ED         | CHASE        | 2006-02-15 04:34:33 |
|        4 | JENNIFER   | DAVIS        | 2006-02-15 04:34:33 |
|        5 | JOHNNY     | LOLLOBRIGIDA | 2006-02-15 04:34:33 |
|        6 | BETTE      | NICHOLSON    | 2006-02-15 04:34:33 |
|        7 | GRACE      | MOSTEL       | 2006-02-15 04:34:33 |
|        8 | MATTHEW    | JOHANSSON    | 2006-02-15 04:34:33 |
|        9 | JOE        | SWANK        | 2006-02-15 04:34:33 |
|       10 | CHRISTIAN  | GABLE        | 2006-02-15 04:34:33 |
+----------+------------+--------------+---------------------+
10 rows in set (0.00 sec)
mysql> CHECKSUM TABLE actor;
+--------------+------------+
| Table        | Checksum   |
+--------------+------------+
| sakila.actor | 3596356558 |
+--------------+------------+
1 row in set (0.00 sec)
 
mysql> SET foreign_key_checks=OFF
mysql> DROP TABLE actor;
Query OK, 0 rows affected (0.00 sec)

Recovery After DROP TABLE From ibdata1

The table is gone, but the information contained in the table can still be in the database file. The data remains untouched until InnoDB reuses free pages. Hurry up and stop MySQL ASAP!
For the recovery, we use the Undrop For InnoDB toolkit.

Check out a video with the toolkit overview and usage instructions.

InnoDB stores all data in B+tree indexes. A table has one clustered index PRIMARY, all fields are stored there. Thus, if the table has secondary keys, each key has an index. Each index is identified by index_id.

Consequently, if we want to recover a table, we have to find all pages that belong to a particular index_id.

stream_parser reads the InnoDB tablespace and sorts InnoDB pages by type and by index_id.

root@test:~/undrop-for-innodb# ./stream_parser -f /var/lib/mysql/ibdata1
Opening file: /var/lib/mysql/ibdata1
File information:
 
ID of device containing file:        64768
inode number:                      1190268
protection:                         100660 (regular file)
number of hard links:                    1
user ID of owner:                      106
group ID of owner:                     114
device ID (if special file):             0
blocksize for filesystem I/O:         4096
number of blocks allocated:          69632
time of last access:            1404842312 Tue Jul  8 13:58:32 2014
time of last modification:      1404842478 Tue Jul  8 14:01:18 2014
time of last status change:     1404842478 Tue Jul  8 14:01:18 2014
total size, in bytes:             35651584 (34.000 MiB)
 
Size to process:                  35651584 (34.000 MiB)
All workers finished in 0 sec
root@test: ~/undrop-for-innodb#

Data from the database pages is saved by the stream_parser to folder pages-ibdata1:

root@test:~/undrop-for-innodb/pages-ibdata1/FIL_PAGE_INDEX# ls
0000000000000001.page  0000000000000121.page  0000000000000382.page
0000000000000395.page  0000000000000408.page  0000000000000421.page
0000000000000434.page  0000000000000447.page  0000000000000002.page
...
0000000000000406.page  0000000000000419.page  0000000000000432.page
0000000000000445.page  0000000000000120.page  0000000000000381.page
0000000000000394.page  0000000000000407.page  0000000000000420.page
0000000000000433.page  0000000000000446.page
root@test: ~/undrop-for-innodb/pages-ibdata1/FIL_PAGE_INDEX

Now, each index_id from the InnoDB tablespace is saved in a separate file. We can use c_parser to fetch records from the pages. But we need to know what index_id corresponds to table sakila/actor. This information we can acquire from the dictionary – SYS_TABLES and SYS_INDEXES.

Dictionaries

SYS_TABLES is always stored in file index_id 1 which is file pages-ibdata1/FIL_PAGE_INDEX/0000000000000001.page
Let’s find the table_id of sakila/actor. If MySQL had enough time to flush changes to disk, then add the -D option which means “find deleted records”. The dictionary is always in REDUNDANT format, so we specify option -4:

root@test:~/undrop-for-innodb# ./c_parser \
    -4D
    -f pages-ibdata1/FIL_PAGE_INDEX/0000000000000001.page \
    -t dictionary/SYS_TABLES.sql \
    | grep sakila/actor
000000000B28  2A000001430D4D  SYS_TABLES  "sakila/actor"  158  4  1 0   0   ""  0
000000000B28  2A000001430D4D  SYS_TABLES  "sakila/actor"  158  4  1 0   0   ""  0

Note number 158 right after the table name. This is table_id.

The next thing do is to find the index id of the PRIMARY index of table actor. For this purpose, we fetch records of SYS_INDEXES from the file 0000000000000003.page (this table will contain information about index_id and table_id). The structure of SYS_INDEXES is passed with -t option.

root@test:~/undrop-for-innodb$ ./c_parser \
    -4D \
    -f pages-ibdata1/FIL_PAGE_INDEX/0000000000000003.page \
    -t dictionary/SYS_INDEXES.sql\
    | grep -w 158
000000000B28    2A000001430BCA  SYS_INDEXES     158     376     "PRIMARY"       1       3       0       4294967295
000000000B28    2A000001430C3C  SYS_INDEXES     158     377     "idx\_actor\_last\_name"        1       0       0       4294967295
000000000B28    2A000001430BCA  SYS_INDEXES     158     376     "PRIMARY"       1       3       0       4294967295
000000000B28    2A000001430C3C  SYS_INDEXES     158     377     "idx\_actor\_last\_name"        1       0       0       4294967295

As you can see from the output, the necessary index_id is 376. Therefore, we look for the actor data in the file 0000000000000376.page.

root@test:~/undrop-for-innodb# ./c_parser \
    -6 \
    -f pages-ibdata1/FIL_PAGE_INDEX/0000000000000376.page \
    -t sakila/actor.sql \
    |  head -5
-- Page id: 895, Format: COMPACT, Records list: Valid, Expected records: (200 200)
000000000AA0    B60000035D0110  actor   1       "PENELOPE"      "GUINESS"       "2006-02-15 04:34:33"
000000000AA0    B60000035D011B  actor   2       "NICK"  "WAHLBERG"      "2006-02-15 04:34:33"
000000000AA0    B60000035D0126  actor   3       "ED"    "CHASE" "2006-02-15 04:34:33"
000000000AA0    B60000035D0131  actor   4       "JENNIFER"      "DAVIS" "2006-02-15 04:34:33"
root@test:~/undrop-for-innodb#

The resulting output looks correct, so let’s save the dump in a file. To make loading simpler, c_parser outputs LOAD DATA INFILE command to STDERR.

We use the default location of these files: dumps/default.

root@test:~/undrop-for-innodb# mkdir -p dumps/default
root@test:~/undrop-for-innodb# ./c_parser \
    -6 \
    -f pages-ibdata1/FIL_PAGE_INDEX/0000000000000376.page\
    -t sakila/actor.sql \
    > dumps/default/actor \
    2> dumps/default/actor_load.sql

And here’s a command to load the table.

root@test:~/undrop-for-innodb# cat dumps/default/actor_load.sql
SET FOREIGN_KEY_CHECKS=0;
LOAD DATA LOCAL 
INFILE '/home/asterix/undrop-for-innodb/dumps/default/actor' 
REPLACE INTO TABLE `actor` 
FIELDS TERMINATED BY '\t' 
OPTIONALLY ENCLOSED BY '"' 
LINES STARTING BY 'actor\t' 
(`actor_id`, `first_name`, `last_name`, `last_update`);
root@test:~/undrop-for-innodb#

Load Data Back To The Database

Now, it’s time to recover the data into the database. But before loading the dump we need to create an empty structure of the table actor:

mysql> source sakila/actor.sql
mysql> show create table actor\G
*************************** 1. row ***************************
       Table: actor
Create Table: CREATE TABLE `actor` (
  `actor_id` smallint(5) unsigned NOT NULL AUTO_INCREMENT,
  `first_name` varchar(45) NOT NULL,
  `last_name` varchar(45) NOT NULL,
  `last_update` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  PRIMARY KEY (`actor_id`),
  KEY `idx_actor_last_name` (`last_name`)
) ENGINE=InnoDB AUTO_INCREMENT=201 DEFAULT CHARSET=utf8
1 row in set (0.00 sec)
mysql>

Now, the table actor is ready. We can load our data after recovery.

root@test:~/undrop-for-innodb# mysql --local-infile -uroot -p
Enter password:
Welcome to the MySQL monitor.  Commands end with ; or \g.
...
mysql> USE sakila;
mysql> source dumps/default/actor_load.sql
Query OK, 0 rows affected (0.00 sec)
 
Query OK, 600 rows affected (0.01 sec)
Records: 400  Deleted: 200  Skipped: 0  Warnings: 0

Checking Recovered Data

And finally, checking data quality. We will see the total number of records, preview several records and calculate checksum.

mysql> SELECT COUNT(*) FROM actor;
+----------+
| COUNT(*) |
+----------+
|      200 |
+----------+
1 row in set (0.00 sec)
 
mysql> SELECT * FROM actor LIMIT 5;
+----------+------------+--------------+---------------------+
| actor_id | first_name | last_name    | last_update         |
+----------+------------+--------------+---------------------+
|        1 | PENELOPE   | GUINESS      | 2006-02-15 04:34:33 |
|        2 | NICK       | WAHLBERG     | 2006-02-15 04:34:33 |
|        3 | ED         | CHASE        | 2006-02-15 04:34:33 |
|        4 | JENNIFER   | DAVIS        | 2006-02-15 04:34:33 |
|        5 | JOHNNY     | LOLLOBRIGIDA | 2006-02-15 04:34:33 |
+----------+------------+--------------+---------------------+
5 rows in set (0.00 sec)
 
mysql> CHECKSUM TABLE actor;
+--------------+------------+
| Table        | Checksum   |
+--------------+------------+
| sakila.actor | 3596356558 |
+--------------+------------+
1 row in set (0.00 sec)

As you can see, checksum after recovery is 3596356558 which is equal to the checksum taken before the accidental drop of the table. Therefore, we can be sure that all the data was recovered correctly.
In the next posts we will see other cases of recovery.

Verify MySQL Backups With TwinDB Backup Tool

aleks — Fri, 25 Sep 2020 17:07:04 +0000

By Maksym Kryva.

If you don’t verify backups you may safely assume you don’t have them.

It often happens that MySQL backups can be invalid or broken due to a software bug, or some hidden corruption. If you are lucky enough, hours and days will be needed to resurrect a database from a bad backup copy. If you ran out of luck quota, you may lose a lot of data. Hence the importance of data backup verification. Not many companies do backups, and even less verify them. To make the verification problem easier, we have added a verify command to the TwinDB Backup Tool.

What the command does is that it takes a backup copy, restores it, prepares (applies redo logs, fixes permissions and so on) and runs a MySQL instance on it. Then it checks if the recovered database is healthy. You can verify the backups either on the same server where the backup was taken, or on a dedicated “verification” machine.

Usage

# twindb-backup verify mysql --help
Usage: twindb-backup verify mysql [OPTIONS] [BACKUP_COPY]
 
  Verify backup
 
Options:
  --dst TEXT       Directory where to restore the backup copy  [default:
                   /tmp/]
  --hostname TEXT  If backup_copy is latest this option specifies hostname
                   where the backup copy was taken.
  --help           Show this message and exit.

backup_copy is a backup copy name. You can get it from the twindb-backup ls output. Or you can pass latest for verifying the most recent MySQL backup.
hostname – if you verify the backup on another machine, you have to specify what host the backup was taken from. If you run it without specifying the hostname, it will use the hostname of the local machine.
dst is a directory for restored mysql backup. By default it’s /tmp.

For example:

twindb-backup verify mysql \
    /path/to/twindb-server-backups/master1/hourly/files/_home-2017-11-13_16_43_17.tar.gz \
    --dst /var/lib/mysql

To verify a backup, twindb-backup gets it from destinations such as S3, SSH, or Local. After this, twindb-backup runs innobackupex to restore the backup from the archive.

This feature works transparently with both full and incremental backups.

Besides, if you configure the twindb-backup tool on export data to DataDog (watch out for our next post :), you can monitor restore time and alert your team about invalid backups, or if restore time breaks SLA.

The TwinDB Backup Tool supports verification starting from version 2.15.0 which you can install from the source code.

Recover Table Structure From InnoDB Dictionary

aleks — Fri, 18 Sep 2020 15:34:32 +0000

When a table gets dropped, MySQL removes the respective .frm file. This post explains how to recover the table structure if the table was dropped.

You need the table structure to recover a dropped table from the InnoDB tablespace. The B+tree structure of the InnoDB index doesn’t contain any information about field types. MySQL needs to know that in order to access records of the InnoDB table. Normally, MySQL gets the table structure from the .frm file. But when MySQL drops a table the respective frm file removed too.

Fortunately, there’s one more place where MySQL keeps the table structure. It’s the InnoDB dictionary.

The InnoDB dictionary is a set of tables where InnoDB keeps information about the tables. I reviewed them in detail in a separate InnoDB Dictionary post earlier. After the DROP, InnoDB deletes records related to the dropped table from the dictionary. So, we need to recover deleted records from the dictionary and then get the table structure.

Compiling Data Recovery Tool

First, we need to get the source code. The code is hosted on GitHub.

git clone https://github.com/twindb/undrop-for-innodb.git

To compile it, we need gcc, bison and flex. Install these packages with a package manager (yum/apt/etc). Then, time to compile.

# make
cc -g -O3 -I./include -c stream_parser.c
cc -g -O3 -I./include  -pthread -lm stream_parser.o -o stream_parser
flex  sql_parser.l
bison  -o sql_parser.c sql_parser.y
sql_parser.y: conflicts: 6 shift/reduce
cc -g -O3 -I./include -c sql_parser.c
cc -g -O3 -I./include -c c_parser.c
cc -g -O3 -I./include -c tables_dict.c
cc -g -O3 -I./include -c print_data.c
cc -g -O3 -I./include -c check_data.c
cc -g -O3 -I./include  sql_parser.o c_parser.o tables_dict.o print_data.o check_data.o -o c_parser -pthread -lm
cc -g -O3 -I./include -o innochecksum_changer innochecksum.c

Recover The InnoDB Dictionary

Now, let’s create dictionary tables in the sakila_recovered database. The data recovery tool comes with the structure of the dictionary tables.

# cat dictionary/SYS_* | mysql sakila_recovered

The dictionary is stored in the ibdata1 file. So, let’s parse it.

./stream_parser -f /var/lib/mysql/ibdata1
 
...
 
Size to process:                  79691776 (76.000 MiB)
Worker(0): 84.13% done. 2014-09-03 16:31:20 ETA(in 00:00:00). Processing speed: 7.984 MiB/sec
Worker(2): 84.21% done. 2014-09-03 16:31:20 ETA(in 00:00:00). Processing speed: 8.000 MiB/sec
Worker(1): 84.21% done. 2014-09-03 16:31:21 ETA(in 00:00:00). Processing speed: 4.000 MiB/sec
All workers finished in 2 sec

Now, we need to extract the dictionary records from the InnoDB pages. Let’s create a directory for the table dumps.

# mkdir -p dumps/default

And now we can generate table dumps and LOAD INFILE commands to load the dumps. We also need to specify the -D option to the c_parser because the records we need were deleted from the dictionary when the table was dropped.

SYS_TABLES

# ./c_parser -4Df pages-ibdata1/FIL_PAGE_INDEX/0000000000000001.page \
    -t dictionary/SYS_TABLES.sql \
    > dumps/default/SYS_TABLES \
    2> dumps/default/SYS_TABLES.sql

SYS_INDEXES

# ./c_parser -4Df pages-ibdata1/FIL_PAGE_INDEX/0000000000000003.page \
    -t dictionary/SYS_INDEXES.sql \
    > dumps/default/SYS_INDEXES \
    2> dumps/default/SYS_INDEXES.sql

SYS_COLUMNS

# ./c_parser -4Df pages-ibdata1/FIL_PAGE_INDEX/0000000000000002.page \
    -t dictionary/SYS_COLUMNS.sql \
    > dumps/default/SYS_COLUMNS \
    2> dumps/default/SYS_COLUMNS.sql

SYS_FIELDS

# ./c_parser -4Df pages-ibdata1/FIL_PAGE_INDEX/0000000000000004.page \
    -t dictionary/SYS_FIELDS.sql \
    > dumps/default/SYS_FIELDS \
    2> dumps/default/SYS_FIELDS.sql

With the generated LOAD INFILE commands it’s easy to load the dumps.

# cat dumps/default/*.sql | mysql sakila_recovered

Now we have the InnoDB dictionary loaded into normal InnoDB tables.

Compiling sys_parser

ys_parser is a tool that reads the dictionary from tables stored in MySQL and generates the CREATE TABLE structure for a table.

To compile it we need MySQL libraries and development files. Depending on distribution, they may be in -devel or -dev package. On RedHat based systems, you can check it with the command yum provides “*/mysql_config” . On my server it was the mysql-community-devel package.

If all necessary packages are installed, the compilation boils down to a simple command:

# make sys_parser
/usr/bin/mysql_config
cc `mysql_config --cflags` `mysql_config --libs` -o sys_parser sys_parser.c

Recover Table Structure

Now sys_parser can do its magic. Just run it to get the CREATE statement in the standard output.

# ./sys_parser
sys_parser [-h ] [-u ] [-p ] [-d ] databases/table

It will use “root” as the username to connect to MySQL, “querty” – as the password. The dictionary is stored in SYS_* tables in the sakila_recovered database. What we want to recover is sakila.actor. InnoDB uses a slash “/” as a separator between database name and table name, so does sys_parser.

# ./sys_parser -u root -p qwerty  -d sakila_recovered sakila/actor
CREATE TABLE `actor`(
`actor_id` SMALLINT UNSIGNED NOT NULL,
`first_name` VARCHAR(45) CHARACTER SET 'utf8' COLLATE 'utf8_general_ci' NOT NULL,
`last_name` VARCHAR(45) CHARACTER SET 'utf8' COLLATE 'utf8_general_ci' NOT NULL,
`last_update` TIMESTAMP NOT NULL,
PRIMARY KEY (`actor_id`)
) ENGINE=InnoDB;

# ./sys_parser -u root -p qwerty  -d sakila_recovered sakila/customer
CREATE TABLE `customer`(
`customer_id` SMALLINT UNSIGNED NOT NULL,
`store_id` TINYINT UNSIGNED NOT NULL,
`first_name` VARCHAR(45) CHARACTER SET 'utf8' COLLATE 'utf8_general_ci' NOT NULL,
`last_name` VARCHAR(45) CHARACTER SET 'utf8' COLLATE 'utf8_general_ci' NOT NULL,
`email` VARCHAR(50) CHARACTER SET 'utf8' COLLATE 'utf8_general_ci',
`address_id` SMALLINT UNSIGNED NOT NULL,
`active` TINYINT NOT NULL,
`create_date` DATETIME NOT NULL,
`last_update` TIMESTAMP NOT NULL,
PRIMARY KEY (`customer_id`)
) ENGINE=InnoDB;

There are few caveats though.

InnoDB doesn’t store all information you can find in the .frm file. For example, if a field is AUTO_INCREMENT, the InnoDB dictionary knows nothing about it. Therefore, sys_parser won’t recover that property. If there were any field or table level comments, they’ll be lost.
sys_parser generates the table structure eligible for further data recovery. It could but it doesn’t recover secondary indexes, or foreign keys.
InnoDB stores the DECIMAL type as a binary string. It doesn’t store the precision of a DECIMAL field. So, that information will be lost.

For example, table payment uses DECIMAL to store money.

# ./sys_parser -u root -p qwerty  -d sakila_recovered sakila/payment
CREATE TABLE `payment`(
        `payment_id` SMALLINT UNSIGNED NOT NULL,
        `customer_id` SMALLINT UNSIGNED NOT NULL,
        `staff_id` TINYINT UNSIGNED NOT NULL,
        `rental_id` INT,
        `amount` DECIMAL(6,0) NOT NULL,
        `payment_date` DATETIME NOT NULL,
        `last_update` TIMESTAMP NOT NULL,
        PRIMARY KEY (`payment_id`)
) ENGINE=InnoDB;

Fortunately, Oracle is planning to extend the InnoDB dictionary and finally get rid of .frm files. I salute that decision, having the structure in two places leads to inconsistencies.

Image credit: m.khajoo

Simple shell script is bad idea

aleks — Tue, 15 Sep 2020 00:26:57 +0000

Internet is full of simple shell backup scripts. Every DBA and system administrators (including myself) has written one. Bash is a bad tool for this job and in this post I will explain why.

Let’s find out why the need to write anything arises at first place. There is a lot of great backup tools for databases as well as for files. Following Unix paradigm they do one thing and do it well. With tar I can copy files, with mysqldump I can copy MySQL database. But there are other problems to solve. How to do scheduling (although it’s the easiest one), encryption (both transfers and at rest), compression, backups retention, work with storage. Surely, I’ve missed many more smaller but important details. That’s exactly a reason why people write wrappers on top of their favorite backup tools.

Don’t you think it’s too much for a shell script?

At the beginning I also needed something quick and dirty to backup our databases and files. In one evening I wrote a simple script (!) for MySQL and for files. It was a bash script, its config was also a bash piece. I put it in cron and was content for the moment.

Then I tried to reuse the script for our customers. The script wasn’t good for them in that state. The databases were bigger, they needed S3 storage, incremental backups and other features. That was a moment when the simplicity fired back.

First of all, the shell script has zero test coverage. There were neither unit not integration tests. If you add a smallest change there is no way to guarantee that something else won’t break. Same with bug fixes. If you fix a bug once there is no way to make sure the bug won’t appear again. I’m not aware of any unit test frameworks for Bash, never seen anybody using them. That’s why Bash is a good tool for stuff like “do step 1; do step 2; do step 3; exit”. Anything more complex than that is a reckless driving without fastened seatbelt – if you skilled enough you’ll get to point B, but if anything goes wrong, consequences are catastrophic.

Another problem with Bash is lack of means to work with data. Everything that can be elegantly organized in Python classes becomes spaghetti code in Bash. For any complex algorithm it’s vitally important.

Error handling in Bash is limited and tricky. The best what it has is set -o errexit . And I hope you learned about set -o pipefail not the hard way like I did :).

Instead of conclusion I want to tell that vast majority of our data recovery cases has similar story. There was a home baked script that wasn’t running/silently failing/buggy. When the backup copy became needed the script failed to deliver.

That happens because backup is not a simple problem and it cannot be solved by a simple script.

P.S.

I’m writing this post looking at a tower (Torre Grossa) built in 14th century. It wouldn’t survive time if ancient masters didn’t follow best practices.

Recovering A Corrupt MySQL Database

aleks — Mon, 14 Sep 2020 23:48:51 +0000

Author Andriy Lysyuk.

The unDROP for InnoDB tool can be used to recover corrupt MySQL databases. In this post, we will show how to repair a MySQL database if its files got corrupted and even innodb_force_recovery=6 doesn’t help.

The corruption of InnoDB tablespace may be caused by many reasons. A dying hard drive can write garbage, thus, page checksum will be wrong. InnoDB then reports to the error log:

InnoDB: Database page corruption on disk or a failed
InnoDB: file read of page 4.

MySQL is well known for its poor start-up script. A simple upgrade procedure may end up with two mysqld processes writing to the same tablespace. That leads to corruption too. Sometimes, power reset corrupts not only the InnoDB files, but the whole file system becomes unusable for the operating system.

InnoDB is very strict when it works with pages. If checksum doesn’t match or some field in the header carries and unexpected value, InnoDB wisely prefers to crash to avoid further corruption.

The manual suggests to start MySQL with innodb_force_recovery option. The purpose of this option is to let user dump their data. There are no means to repair the tablespace. Users must drop the tablespace, create a new one and load the data back.

innodb_force_recovery accepts values from one to six. The higher value, the more tests InnoDB disables.

In this post, we will assume that MySQL cannot start even with innodb_force_recovery=6.

The recovery toolkit works directly with the InnoDB files, it can read records from the InnoDB page. If some part of the page is damaged, it will just skip that piece and continue to read the records further in the page.

So, let’s corrupt some InnoDB file and recover the table.

InnoDB Corruption

For the sake of simplicity we will overwrite part of a .ibd file in the area with user data.
In real life, the corruption may be at any place of index PRIMARY.
In the middle of the PRIMARY index of the sakila.actor table, we will rewrite the data with 128 “A” characters:

0000C058  00 00 00 02 00 32 01 00 02 00 1C 69 6E 66 69 6D 75 6D 00 05 .....2.....infimum..
0000C06C  00 0B 00 00 73 75 70 72 65 6D 75 6D 07 08 00 00 10 00 29 00 ....supremum......).
0000C080  01 00 00 00 00 05 1E 9F 00 00 01 4D 01 10 50 45 4E 45 4C 4F ...........M..PENELO
0000C094  50 45 47 55 49 4E 45 53 53 43 F2 F5 A9 08 04 00 00 18 00 26 PEGUINESSC.........&
0000C0A8  00 02 00 00 00 00 05 1E 9F 00 00 01 4D 01 1A 4E 49 43 4B 57 ............M..NICKW
0000C0BC  41 48 4C 42 45 52 47 43 F2 F5 A9 05 02 00 00 20 00 21 00 03 AHLBERGC....... .!..
0000C0D0  00 00 00 00 05 1E 9F 00 00 01 4D 01 24 45 44 43 48 41 53 45 ..........M.$EDCHASE
0000C0E4  43 F2 F5 A9 05 08 04 00 28 00 27 00 04 00 00 00 00 05 1E 9F C.......(.'.........
0000C0F8  00 00 01 4D 01 2E 4A 45 4E 4E 49 46 45 52 44 41 56 49 53 43 ...M..JENNIFERDAVISC
0000C10C  F2 F5 A9 0C 06 00 00 30 00 2C 00 05 00 00 00 00 05 1E 9F 00 .......0.,..........
0000C120  00 01 4D 01 38 4A 4F 48 4E 4E 59 4C 4F 4C 4C 4F 42 52 49 47 ..M.8JOHNNYLOLLOBRIG
0000C134  49 44 41 43 F2 F5 A9 09 05 00 00 38 00 28 00 06 00 00 00 00 IDAC.......8.(......
0000C148  05 1E 9F 00 00 01 41 41 41 41 41 41 41 41 41 41 41 41 41 41 ......AAAAAAAAAAAAAA
0000C15C  41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 AAAAAAAAAAAAAAAAAAAA
0000C170  41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 AAAAAAAAAAAAAAAAAAAA
0000C184  41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 AAAAAAAAAAAAAAAAAAAA
0000C198  41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 AAAAAAAAAAAAAAAAAAAA
0000C1AC  41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 AAAAAAAAAAAAAAAAAAAA
0000C1C0  41 41 41 41 41 41 41 41 41 41 41 41 41 41 4E 4B 43 F2 F5 A9 AAAAAAAAAAAAAANKC...
0000C1D4  05 09 00 00 58 00 28 00 0A 00 00 00 00 05 1E 9F 00 00 01 4D ....X.(............M
0000C1E8  01 6A 43 48 52 49 53 54 49 41 4E 47 41 42 4C 45 43 F2 F5 A9 .jCHRISTIANGABLEC...
0000C1FC  04 04 00 00 60 00 22 00 0B 00 00 00 00 05 1E 9F 00 00 01 4D ....`."............M

Corrupted InnoDB Table Crashes MySQL

When MySQL reads a page with the user data, checksum is wrong and the server crashes.

mysql> SELECT COUNT(*) FROM sakila.actor
+----------+
| COUNT(*) |
+----------+
|      200 |
+----------+
 
ERROR 2013 (HY000) at line 1: Lost connection to MySQL server during query

Before the crash, MySQL writes to the error log what exactly went wrong and dumps the faulty page:

Version: '5.6.19-67.0'  socket: '/var/lib/mysql/mysql.sock'  port: 3306  Percona Server (GPL), Release 67.0, Revision 618
InnoDB: Database page corruption on disk or a failed
InnoDB: file read of page 4.
InnoDB: You may have to recover from a backup.
2014-07-14 20:18:44 7f060bfff700 InnoDB: Page dump in ascii and hex (16384 bytes):
len 16384; hex 1bce9a5000000004ffffffffffffffff0000000026c3095945bf00000000000000000

Recovering InnoDB Table Corruption

When you see corruption in the InnoDB tablespace. the first thing to try is to start MySQL with the innodb_force_recovery option. It makes sense to try all values starting from one to six.

We assume that MySQL doesn’t start even with innodb_force_recovery=6 or it does start, but any SELECT crashes it.

The recovery plan is the following:

Split corrupted InnoDB tablespace into pages; sort the pages by type and index_id.
Fetch records from the PRIMARY index of the table.
DROP the corrupted table and create a new one.
Load records back into MySQL.

We would need to parse two tablespaces: ibdata1 and actor.ibd (since option innodb_file_per_table=ON). The InnoDB dictionary is stored in ibdata1, we need it to know the index_id of the PRIMARY index of the sakila.actor table.

Split Corrupted InnoDB Tablespace

# ./stream_parser -f /var/lib/mysql/ibdata1
Opening file: /var/lib/mysql/ibdata1
File information:
 
ID of device containing file:        64768
inode number:                         8028
protection:                         100660 (regular file)
number of hard links:                    1
user ID of owner:                      106
group ID of owner:                     114
device ID (if special file):             0
blocksize for filesystem I/O:         4096
number of blocks allocated:          36864
time of last access:            1406832698 Thu Jul 31 14:51:38 2014
time of last modification:      1406833058 Thu Jul 31 14:57:38 2014
time of last status change:     1406833058 Thu Jul 31 14:57:38 2014
total size, in bytes:             18874368 (18.000 MiB)
 
Size to process:                  18874368 (18.000 MiB)
All workers finished in 0 sec

Now actor.ibd‘s turn.

# ./stream_parser -f /var/lib/mysql/sakila/actor.ibd
Opening file: /var/lib/mysql/sakila/actor.ibd
File information:
 
ID of device containing file:        64768
inode number:                         8037
protection:                         100660 (regular file)
number of hard links:                    1
user ID of owner:                      106
group ID of owner:                     114
device ID (if special file):             0
blocksize for filesystem I/O:         4096
number of blocks allocated:            224
time of last access:            1406832349 Thu Jul 31 14:45:49 2014
time of last modification:      1406832300 Thu Jul 31 14:45:00 2014
time of last status change:     1406832300 Thu Jul 31 14:45:00 2014
total size, in bytes:               114688 (112.000 kiB)
 
Size to process:                    114688 (112.000 kiB)
All workers finished in 0 sec

Recover InnoDB Dictionary

We need to know the index_id of the PRIMARY index of the sakila.actor table. Learn more about InnoDB dictionary. Now, we’ll just get index_id of sakila.actor:

# ./c_parser -4f pages-ibdata1/FIL_PAGE_INDEX/0000000000000001.page -t dictionary/SYS_TABLES.sql |grep actor
000000000504    85000001320110  SYS_TABLES      "sakila/actor"  13      4       1       0       0       ""      1
00000000050D    8E0000013B0110  SYS_TABLES      "sakila/film\_actor"    20      3       1       0       0       ""      8
...
 
# ./c_parser -4f pages-ibdata1/FIL_PAGE_INDEX/0000000000000003.page -t dictionary/SYS_INDEXES.sql |grep 13
000000000300    810000012D01D3  SYS_INDEXES     11      13      "REF\_IND"      1       0       0       304
000000000504    85000001320178  SYS_INDEXES     13      15      "PRIMARY"       1       3       1       3
000000000504    850000013201A6  SYS_INDEXES     13      16      "idx\_actor\_last\_name"        1       0       1       4
000000000505    860000013301CE  SYS_INDEXES     14      17      "PRIMARY"       1       3       2       3
...

So, the index_id of the PRIMARY index of the sakila.actor table is 15, the fifth column in the dump.

Recover Records From The PRIMARY Index Of The Table

c_parser reads InnoDB pages, matches them with a given table structure, and dumps records in a tab-separated values format.

Opposite to InnoDB, when c_parser hits corrupted area it skips it and continues reading the page.

Let’s read the records from index_id 15, which is the PRIMARY index according to the dictionary.

# ./c_parser -6f pages-actor.ibd/FIL_PAGE_INDEX/0000000000000015.page \
    -t sakila/actor.sql \
    > dumps/default/actor
    2> dumps/default/actor_load.sql
 
# cat dumps/default/actor
-- Page id: 3, Format: COMPACT, Records list: Invalid, Expected records: (0 200)
72656D756D07    08000010002900  actor   30064   "\0\0\0\0"      ""      "1972-09-20 23:07:44"
1050454E454C    4F50454755494E  actor   19713   "ESSC▒" ""      "2100-08-09 07:52:36"
00000000051E    9F0000014D011A  actor   2       "NICK"  "WAHLBERG"      "2006-02-15 04:34:33"
00000000051E    9F0000014D0124  actor   3       "ED"    "CHASE" "2006-02-15 04:34:33"
00000000051E    9F0000014D012E  actor   4       "JENNIFER"      "DAVIS" "2006-02-15 04:34:33"
00000000051E    9F0000014D0138  actor   5       "JOHNNY"        "LOLLOBRIGIDA"  "2006-02-15 04:34:33"
00000000051E    9F000001414141  actor   6       "AAAAA" "AAAAAAAAA"     "2004-09-10 01:53:05"
 
00000000051E    9F0000014D016A  actor   10      "CHRISTIAN"     "GABLE" "2006-02-15 04:34:33"
...

We have identified some valid records, but certainly there’s also some “garbage”. Pay attention to the recovered records before “Nick Wahlberg”. Definitely, there should be records of “Penelope Guiness”, since we haven’t overwritten that data in the actor.ibd file.

Filters

We can improve recovery quality by applying filters on possible values of certain fields. There are 200 records on the original table, but the first two “garbage” records have some weird identifiers (30064 and 19713). We know that the actor identifier should be in the range of [1..300]. Therefore, we tell the parser to match that condition. For this purpose, we add a hint in the comments of the actor.sql file that defines actor table. This comment should be in a special format for parser to recognize it. Listing of the part of actor.sql file (note a comma after the comment!):

CREATE TABLE `actor` (
  `actor_id` smallint(5) unsigned NOT NULL AUTO_INCREMENT
    /*!FILTER
     int_min_val: 1
     int_max_val: 300 */,
  `first_name` varchar(45) NOT NULL,
  `last_name` varchar(45) NOT NULL,
  `last_update` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  PRIMARY KEY (`actor_id`),
  KEY `idx_actor_last_name` (`last_name`)
) ENGINE=InnoDB AUTO_INCREMENT=201 DEFAULT CHARSET=utf8;

After applying the filter, recognized records look much better:

# ./c_parser -6f pages-actor.ibd/FIL_PAGE_INDEX/0000000000000015.page \
    -t sakila/actor.sql \
    > dumps/default/actor \
    2> dumps/default/actor_load.sql
 
# head -10 dumps/default/actor
-- Page id: 3, Format: COMPACT, Records list: Invalid, Expected records: (0 200)
00000000051E    9F0000014D0110  actor   1       "PENELOPE"      "GUINESS"       "2006-02-15 04:34:33"
00000000051E    9F0000014D011A  actor   2       "NICK"  "WAHLBERG"      "2006-02-15 04:34:33"
00000000051E    9F0000014D0124  actor   3       "ED"    "CHASE" "2006-02-15 04:34:33"
00000000051E    9F0000014D012E  actor   4       "JENNIFER"      "DAVIS" "2006-02-15 04:34:33"
00000000051E    9F0000014D0138  actor   5       "JOHNNY"        "LOLLOBRIGIDA"  "2006-02-15 04:34:33"
00000000051E    9F000001414141  actor   6       "AAAAA" "AAAAAAAAA"     "2004-09-10 01:53:05"
00000000051E    9F0000014D016A  actor   10      "CHRISTIAN"     "GABLE" "2006-02-15 04:34:33"
00000000051E    9F0000014D0174  actor   11      "ZERO"  "CAGE"  "2006-02-15 04:34:33"
00000000051E    9F0000014D017E  actor   12      "KARL"  "BERRY" "2006-02-15 04:34:33"

You see, the record for “Penelope Guiness” is already here. The only issue remaining is the invalid record 6-“AAAAA”-“AAAAAAAAA”. This happens because the record appeared to have actor_id of 6, which is what we expected. Ideally, the dump shouldn’t have any junk records, so you may try to add more filters on other fields.
Or, we can delete these records in the database manually later.

DROP Corrupted Tables And Create New One

As soon as we have dumps of all tables we need to create a new instance of MySQL.

If it’s a single table corruption it makes sense to try innodb_force_recovery=6 to DROP the table.

If MySQL can’t even start, try to move the corrupt actor.ibd elsewhere. In the recovery mode, after DROP TABLE actor MySQL removes the record from the dictionary. Remove actor.frm if it still remains.

The point is to get a clean up & running MySQL, ready to import the table dump.

Once MySQL is ready create an empty table actor:

mysql> CREATE TABLE `actor` (
    ->     `actor_id` smallint(5) unsigned NOT NULL AUTO_INCREMENT,
    ->     `first_name` varchar(45) NOT NULL,
    ->     `last_name` varchar(45) NOT NULL,
    ->     `last_update` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
    ->      PRIMARY KEY (`actor_id`),
    ->      KEY `idx_actor_last_name` (`last_name`)
    ->      ) ENGINE=InnoDB DEFAULT CHARSET=utf8 ;
Query OK, 0 rows affected (0.01 sec)

Load Records Back Into MySQL

Then, we load information from the recovered dump:

# mysql --local-infile -uroot -p$mypass
Welcome to the MySQL monitor.  Commands end with ; or \g.
...
mysql> use sakila;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A
 
Database changed
mysql> source dumps/default/actor_load.sql
Query OK, 0 rows affected (0.00 sec)
 
Query OK, 199 rows affected, 1 warning (0.00 sec)
Records: 198  Deleted: 1  Skipped: 0  Warnings: 1

The final step is to know how much data we’ve lost due to corruption.
The c_parser provides count of expected and actually found records.
In the beginning of each page it gives number of expected records:

-- Page id: 3, Format: COMPACT, Records list: Invalid, Expected records: (0 200)

Which means 200 records are expected, but the list of records is broken (thus, Records list: Invalid).

In the end of each page it gives a summary of what was actually found.

-- Page id: 3, Found records: 197, Lost records: YES, Leaf page: YES

Show Differences Found by pt-table-checksum

aleks — Sun, 13 Sep 2020 22:35:48 +0000

First of all, pt-table-checksum is perfect for checking if the master and its slaves are in-sync. It answers the question “Are the slaves consistent with the master?”. However, if they’re not, pt-table-checksum doesn’t actually tell you what exactly is wrong.

master# pt-table-checksum -q
TS ERRORS DIFFS ROWS CHUNKS SKIPPED TIME TABLE
09-03T22:21:10 0 1 2 1 0 0.013 mysql.proxies_priv
09-03T22:21:10 0 1 9 1 0 0.010 mysql.user

From `pt-table-checksum` to `pt-table-sync`

Nevertheless, pt-table-sync may give you a partial answer. It can print SQL statements to sync the replication cluster. Thus, reading the SQL code, you may guess what records were missing/extra or differ.

master# pt-table-sync --print --replicate percona.checksums localhost
DELETE FROM `mysql`.`proxies_priv` WHERE `host`='slave.box' AND `user`='root' AND `proxied_host`='' AND `proxied_user`='' LIMIT 1 /*percona-toolkit src_db:mysql src_tbl:proxies_priv src_dsn:h=localhost dst_db:mysql dst_tbl:proxies_priv dst_dsn:h=ip-192-168-35-251.ec2.internal lock:1 transaction:0 changing_src:percona.checksums replicate:percona.checksums bidirectional:0 pid:21261 user:root host:master.box*/;
REPLACE INTO `mysql`.`proxies_priv`(`host`, `user`, `proxied_host`, `proxied_user`, `with_grant`, `grantor`, `timestamp`) VALUES ('localhost', 'root', '', '', '1', '', '2016-09-03 20:02:28') /*percona-toolkit src_db:mysql src_tbl:proxies_priv src_dsn:h=localhost dst_db:mysql dst_tbl:proxies_priv dst_dsn:h=ip-192-168-35-251.ec2.internal lock:1 transaction:0 changing_src:percona.checksums replicate:percona.checksums bidirectional:0 pid:21261 user:root host:master.box*/;
REPLACE INTO `mysql`.`proxies_priv`(`host`, `user`, `proxied_host`, `proxied_user`, `with_grant`, `grantor`, `timestamp`) VALUES ('master.box', 'root', '', '', '1', '', '2016-09-03 20:02:28') /*percona-toolkit src_db:mysql src_tbl:proxies_priv src_dsn:h=localhost dst_db:mysql dst_tbl:proxies_priv dst_dsn:h=ip-192-168-35-251.ec2.internal lock:1 transaction:0 changing_src:percona.checksums replicate:percona.checksums bidirectional:0 pid:21261 user:root host:master.box*/;
...

This is not very convenient. I’d note that pt-table-sync serves its main purpose – syncing two or more servers. It was never a goal to show you inconsistencies between two servers. Which is something you’d like to know.

Using TwinDB Table Compare

TwinDB Table Compare is a tool that shows the exact difference between a slave and its master. It’s published on PyPi, therefore, you can install it right away:

slave# pip install twindb-table-compare

Old versions of twindb-table-compare used MySQLDb, which depended on libmysql. Not anymore. Now it uses pure Python implementation pymysql. A little convenience for users.

TwinDB Table Compare works side-by-side with pt-table-checksum. You know, pt-table-checksum runs against the master and populates table percona.checksums.

twindb-table-compare reads mismatched chunks from percona.checksums, then runs a SELECT against the master and the slave. Then compares the result and shows it to a user as a diff.

Let’s see twindb-table-compare in action. To illustrate how it works, I intentionally deleted a record directly from the slave.

slave> delete from t1 where id = 393188;

Then, I ran pt-table-checksum on the master, that found an inconsistency. To check if there are any mismatching chunks you can run a query against percona.checksums.

slave# mysql -e "SELECT * FROM percona.checksums WHERE this_cnt<>master_cnt OR this_crc<>master_crc AND db='test'\G"
*************************** 1. row ***************************
            db: test
           tbl: t1
         chunk: 3
    chunk_time: 0.202156
   chunk_index: PRIMARY
lower_boundary: 319858
upper_boundary: 393197
      this_crc: 376517ca
      this_cnt: 73339
    master_crc: 0
    master_cnt: 73340
            ts: 2020-09-13 22:31:28

Now, that we know there is an inconsistency, we need to know what exactly it is.

slave# twindb-table-compare --user dba --password qwerty
...
2020-09-13 22:33:47,642: INFO: compare.get_inconsistencies():543: Executing: SELECT chunk FROM percona.checksums WHERE (this_crc<>master_crc OR this_cnt<>master_cnt) AND db='test' AND tbl='t1'
2020-09-13 22:33:47,643: INFO: compare.get_inconsistencies():554: Found 1 inconsistent chunk
2020-09-13 22:33:47,643: INFO: compare.build_chunk_query():360: # test.t1, chunk 3
2020-09-13 22:33:47,643: INFO: compare.get_chunk_index():63: Executing SELECT chunk_index FROM percona.checksums WHERE db='test' AND tbl='t1' AND chunk = 3
2020-09-13 22:33:47,700: INFO: compare.build_chunk_query():362: # chunk index: PRIMARY
2020-09-13 22:33:47,700: INFO: compare.get_index_fields():86: Executing SELECT COLUMN_NAME FROM information_schema.STATISTICS WHERE TABLE_SCHEMA='test' AND TABLE_NAME='t1' AND INDEX_NAME='PRIMARY' ORDER BY SEQ_IN_INDEX
2020-09-13 22:33:47,702: INFO: compare.get_boundary():129: Executing SELECT lower_boundary, upper_boundary FROM percona.checksums WHERE db='test' AND tbl='t1' AND chunk = 3
2020-09-13 22:33:47,705: INFO: compare.print_horizontal():402: Executing: SELECT id, name FROM test.t1 USE INDEX (PRIMARY) WHERE ( 0 OR ( 1 AND id >= '319858' ) ) AND ( 0 OR ( 1 AND id <= '393197' ) ) 2020-09-13 22:33:52,175: INFO: compare.print_horizontal():424: Executing: SELECT id, name FROM test.t1 USE INDEX (PRIMARY) WHERE ( 0 OR ( 1 AND id >= '319858' ) ) AND ( 0 OR ( 1 AND id <= '393197' ) )
2020-09-13 22:33:56,944: INFO: compare.get_inconsistencies():566: Differences between slave localhost and its master:
@@ -73328,7 +73328,6 @@
393185 40.79178826957176
393186 40.79178826957176
393187 40.79178826957176
-393188 40.79178826957176
393189 40.79178826957176
393190 40.79178826957176
393191 40.79178826957176

From the output we see that the slave is missing a record with id 393188 - the one I deleted.

Efficient Use Of Indexes In MySQL

aleks — Sun, 13 Sep 2020 17:41:55 +0000

These are the slides of the “Efficient Use Of Indexes In MySQL” talk we delivered on the SFMySQL Meetup.

This is an introductory talk for developers on MySQL indexes. In my opinion, it’s quite important to understand how InnoDB organizes data. If you know how MySQL accesses data, it’s easier to write optimal queries.

When working with queries, I imagine secondary indexes as a table with records sorted by secondary key fields. This is a powerful concept that helps to understand the MySQL logic. It’s also easy to understand complex optimizations like loose index scan.

For example, for index (last_name, rank) the secondary index table looks like:

Enjoy the slides!

Efficient Use of indexes in MySQL from Aleksandr Kuzminsky

The talk has been recorded. Unfortunately, sound is very quiet. Probably a webinar on the topic will make it up. Let me know if you’d be interested in the comments below!

InnoDB Dictionary

aleks — Sat, 12 Sep 2020 21:23:22 +0000

Why Does InnoDB Need A Dictionary

An InnoDB dictionary is a set of internal tables InnoDB uses to maintain various information about user tables. It serves as API between a human and the database. While the humans refer to tables by their names, InnoDB works with integer identifiers. The dictionary stores correspondence between table name and index_id.

The dictionary tables are normal InnoDB tables, but they’re not visible to a user. However, some versions of MySQL provide read-only access to the dictionary in information_schema database.

The dictionary is stored in ibdata1. Root page of SYS_TABLES, for example, has id 8, so it’s the eighth page from the beginning of ibdata1.

The dictionary pages are in REDUNDANT format even if you use MySQL 5.6. I hope to write more about record formats in future posts. For now, it’s enough to mention that REDUNDANT is the oldest record format. It was available since 4.0 and was the only format by that time.

SYS_TABLES

CREATE TABLE `SYS_TABLES` (
    `NAME` varchar(255) NOT NULL DEFAULT '',
    `ID` bigint(20) unsigned NOT NULL DEFAULT '0',
    `N_COLS` int(10) DEFAULT NULL,
    `TYPE` int(10) unsigned DEFAULT NULL,
    `MIX_ID` bigint(20) unsigned DEFAULT NULL,
    `MIX_LEN` int(10) unsigned DEFAULT NULL,
    `CLUSTER_NAME` varchar(255) DEFAULT NULL,
    `SPACE` int(10) unsigned DEFAULT NULL,
    PRIMARY KEY (`NAME`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

All dictionary tables have predefined hard-coded index_ids. No surprise, as long as InnoDB doesn’t have a dictionary for the dictionary :). SYS_TABLES has index_id 0-1. Root page of SYS_TABLES is page id 8.

NAME is the human readable table name. Well, the database name and the table name. For example, for table sakila.actor NAME will be “sakila/actor”.

ID is the table identifier. Just an integer number. The table is referenced to by this table_id in other dictionary tables.

N_COLS is the number of columns in the table.

I will pretend TYPE, MIX_ID, MIX_LEN and CLUSTER_NAME are not important. Actually, I have no idea what these fields are. Anyway, they were never used for data recovery, so let’s leave them for future investigations.

SPACE is an identifier of a tablespace. Thus ibdata1 is SPACE 0, ibdata2 is SPACE 1, Every ibd file has its own SPACE identifier. SPACE tells in what tablespace the table’s indexes are stored.

Sample record for table sakila.actor:

# ./c_parser \
        -4f pages-1402815586/FIL_PAGE_INDEX/0-1 \
        | grep sakila/actor
...
"sakila/actor" 61 4 1 0 80 "" 47

“sakila/actor” is the table name obviously. 61 is the table id. 4 is the number of fields in the table:

CREATE TABLE `actor` (
  `actor_id` smallint(5) unsigned NOT NULL 
             AUTO_INCREMENT,
  `first_name` varchar(45) NOT NULL,
  `last_name` varchar(45) NOT NULL,
  `last_update` timestamp NOT NULL 
                DEFAULT CURRENT_TIMESTAMP
                ON UPDATE CURRENT_TIMESTAMP,
  PRIMARY KEY (`actor_id`),
  KEY `idx_actor_last_name` (`last_name`)
) ENGINE=InnoDB 
  AUTO_INCREMENT=201 
  DEFAULT CHARSET=utf8

47 is the SPACE identifier. It would be 0 if innodb_file_per_table was disabled. It’s not in my environment:

 mysql> show variables like 'innodb_file_per_table';
+-----------------------+-------+
| Variable_name         | Value |
+-----------------------+-------+
| innodb_file_per_table | ON    |
+-----------------------+-------+
1 row in set (0.00 sec)

SYS_INDEXES

Another important dictionary table is SYS_INDEXES. It stores information about InnoDB indexes.

Its index_id is 0-3 and the root page is stored in page 11.

CREATE TABLE `SYS_INDEXES` (
  `TABLE_ID` bigint(20) unsigned NOT NULL 
             DEFAULT '0',
  `ID` bigint(20) unsigned NOT NULL DEFAULT '0',
  `NAME` varchar(120) DEFAULT NULL,
  `N_FIELDS` int(10) unsigned DEFAULT NULL,
  `TYPE` int(10) unsigned DEFAULT NULL,
  `SPACE` int(10) unsigned DEFAULT NULL,
  `PAGE_NO` int(10) unsigned DEFAULT NULL,
  PRIMARY KEY (`TABLE_ID`,`ID`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

Fields of the table:

TABLE_ID is the table identifier. It’s one from SYS_TABLES.

ID is the index identifier. This is a very important field. The matter is, every InnoDB page contains the index_id of the index it belongs to in the header. So, if we want to recover a table, we need to find all pages that belong to the PRIMARY index of the table, which means we need all pages with a specific index_id.

NAME is the name of an index. Table sakila.actor has two indexes: PRIMARY and idx_actor_last_name. So, these are names in SYS_INDEXES.

N_FIELDS is the number of fields in an index. idx_actor_last_name has one field, so is it in SYS_INDEXES.

TYPE is not an important field

SPACE is an InnoDB tablespace identifier showing where the index is stored. Probably, InnoDB developers foresaw the possibility to store indexes in different files, but for current MySQL versions it’s the same as in SYS_TABLES.

PAGE_NO is an identifier of the root page of the index. As you know, InnoDB stores indexes in B+tree data structure. Thus, root node of the B+tree is the root page identified by this field.

SYS_COLUMNS

SYS_COLUMNS describes individual fields in a table. It is stored as index_id 0-2.

 CREATE TABLE `SYS_COLUMNS` (
  `TABLE_ID` bigint(20) unsigned NOT NULL,
  `POS` int(10) unsigned NOT NULL,
  `NAME` varchar(255) DEFAULT NULL,
  `MTYPE` int(10) unsigned DEFAULT NULL,
  `PRTYPE` int(10) unsigned DEFAULT NULL,
  `LEN` int(10) unsigned DEFAULT NULL,
  `PREC` int(10) unsigned DEFAULT NULL,
  PRIMARY KEY (`TABLE_ID`,`POS`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

TABLE_ID is the table identifier, it comes from SYS_TABLES

POS is a relative position of the field in the table.

NAME is the name of the field.

MTYPE and PRTYPE encode the MySQL type of the field. InnoDB started its history as a separate product, not related to MySQL. These two fields are legacy from that time. There’s no strict correspondence between MySQL and the InnoDB type. InnoDB uses its own type that suits a particular MySQL type the best. Encoding and collation are stored here as well.

LEN is the length of the field.

PREC is precision for some MySQL types.

Isn’t it interesting to see an example of the table for sakila.actor? Here it goes. Remember, table_id is 61? We know that from SYS_TABLES.

61      0       "actor\_id"     6       1794    2       0
61      1       "first\_name"   12      2162959 135     0
61      2       "last\_name"    12      2162959 135     0
61      3       "last\_update"  3       525575  4       0

As we see, smallint is two bytes, timestamp is four bytes. And varchar(45) is 135 bytes. Apparently, this is the maximum size, not the actual size of a field which is different for every record. But why 135, not 45? Because the field is utf8, where each character may use up to three bytes. 3 x 45 adds up to 135.

SYS_FIELDS

The last table for today (but not least) is SYS_FIELDS. It’s stored in index_id 0-4.
It lists individual fields on all indexes.

 CREATE TABLE `SYS_FIELDS` (
  `INDEX_ID` bigint(20) unsigned NOT NULL,
  `POS` int(10) unsigned NOT NULL,
  `COL_NAME` varchar(255) DEFAULT NULL,
  PRIMARY KEY (`INDEX_ID`,`POS`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

The table is quite simple.

INDEX_ID is the index identifier that we can find in SYS_INDEXES.

POS is the position of the field in the index.

And COL_NAME is the field name.

Why Do We Care?

Why do we need an InnoDB dictionary anyway?

There are two reasons. First, if we need to recover table A, we need to know index_id of its PRIMARY index. The easiest way to learn it is the InnoDB dictionary.

Second, as you see, the InnoDB dictionary stores some information about field names and types. Which means, in case we don’t know the table structure we can recover it from the dictionary.

TwinDB minimalist

Recovery After DROP TABLE, With innodb_file_per_table ON

Introduction

Accidental Drop

Recovery After DROP TABLE

Loading Data Back To The Database

Checking Data Quality

Recovery After DROP TABLE, With innodb_file_per_table OFF

Introduction

One Wrong Move, And The Table’s Gone

Recovery After DROP TABLE From ibdata1

Dictionaries

Load Data Back To The Database

Checking Recovered Data

Verify MySQL Backups With TwinDB Backup Tool

Usage

Recover Table Structure From InnoDB Dictionary

Compiling Data Recovery Tool

Recover The InnoDB Dictionary

SYS_TABLES

SYS_INDEXES

SYS_COLUMNS

SYS_FIELDS

Compiling sys_parser

Recover Table Structure

Simple shell script is bad idea

Recovering A Corrupt MySQL Database

InnoDB Corruption

Corrupted InnoDB Table Crashes MySQL

Recovering InnoDB Table Corruption

Split Corrupted InnoDB Tablespace

Recover InnoDB Dictionary

Recover Records From The PRIMARY Index Of The Table

Filters

DROP Corrupted Tables And Create New One

Load Records Back Into MySQL

Show Differences Found by pt-table-checksum

From pt-table-checksum to pt-table-sync

Using TwinDB Table Compare

Efficient Use Of Indexes In MySQL

InnoDB Dictionary

Why Does InnoDB Need A Dictionary

SYS_TABLES

SYS_INDEXES

SYS_COLUMNS

SYS_FIELDS

Why Do We Care?

From `pt-table-checksum` to `pt-table-sync`