Thursday, November 5, 2015

High Performance MySQL: CH4 Optimizing Schema and Data Types

1- It’s harder for MySQL to optimize queries that refer to nullable columns, because they make indexes, index statistics, and value comparisons more complicated. A nullable column uses more storage space and requires special processing inside MySQL. When a nullable column is indexed, it requires an extra byte per entry.

2- The performance improvement from changing NULL columns to NOT NULL is usually small, so don’t make it a priority

Number types
3- TINYINT, SMALLINT, MEDIUMINT, INT, or BIGINT.
4- MySQL lets you specify a “width” for integer types, such as INT(11). This is meaningless
for most applications: it does not restrict the legal range of values, but simply specifies the number of characters MySQL’s interactive tools (such as the command-line client) will reserve for display purposes. For storage and computational purposes, INT(1) is identical to INT(10)

Real Numbers
1- we have FLOAT, DOUBLE and DECIMAL,
2- DECIMAL is for storing exact fractional numbers (so accurate),
3- DECIMAL is slow as the server itself performs DECIMAL math in MySQL 5.0 and newer, because CPUs don’t support the computations directly. Floating-point math is significantly faster, because the CPU performs the computations natively.

String Types
1- we have CHAR and VARCHAR
2- VARCHAR uses extra bytes to store the value's length

BLOB and TEXT types
1- we have types:TINYTEXT, SMALL TEXT, TEXT, MEDIUMTEXT, and LONGTEXT (TEXT is synonym for SMALLTEXT)
2- we have TINYBLOB, SMALLBLOB, BLOB, MEDIUMBLOB, and LONGBLOB. BLOB is a synonym for SMALLBLOB,
3- The only difference between the BLOB and TEXT families is that BLOB types store binary data with no collation or character set, but TEXT types have a character set and collation
4- Memory storage engine doesn’t support the BLOB and TEXT types, queries that use BLOB or TEXT columns and need an implicit temporary table ( he mentioned a trick to put them in Memory which is using SUBSTRING, check page 122 ).

Using ENUM
1- An ENUM column can store a predefined set of distinct string values.
2- ENUM list is fixed if you want to update you should perform ALERT TABLE request.


Date and Time Types
1- DATETIME: stored like this YYYYMMDDHHMMSS
2- TIMESTAMP type stores the number of seconds elapsed since midnight
3- TIMESTAMP uses only four bytes of storage, so it has a much smaller range than DATETIME
4- TIMESTAMP is timezone aware, if you store or access data from multiple time zones, the behavior of TIMESTAMP and DATETIME will be very different.
5- MySQL currently does not have an appropriate data type for this, but you can use your own storage format

Bit-Packed Data Types
1- we have BIT, SET

General Notes
1- be very careful with completely “random” strings, such as those produced by MD5(), SHA1(), or UUID(). Each new value you generate with them will be distributed in arbitrary ways over a large space, which can slow INSERT and some types of SELECT queries

2- Object-relational mapping (ORM) systems (and the “frameworks” that use them) are another frequent performance nightmare.

3- People often use VARCHAR(15) columns to store IP addresses. However, they are really unsigned 32-bit integers, not strings.

4- ENUM has some performance issues.

5- try not to use NULL, maybe you can use "NO VALUE" or -1 or ...

6- if you have a table with counter value, you may have concurrency issue. You can get higher concurrency by keeping more than one row and updating a random row.

Speeding Up ALTER TABLE
1- In general, most ALTER TABLE operations will cause interruption of service in MySQL.
2- MySQL performs most alterations by making an empty table with the desired new structure, inserting all the data from the old table into the new one, and deleting the old table. This can take a very long time,
3- MySQL 5.1 and newer include support for some types of “online” operations that won’t lock the table for the whole operation. Recent versions of InnoDB also support building indexes by sorting, which makes building indexes much faster and results in a compact index layout.
4- you can use some tools for altering table to swap to a temp server then come back, some of these tools are two. the “online schema change” tools from Facebook’s database operations team (https://launchpad.net/mysqlatfacebook),Shlomi Noach’s openark toolkit (http://code.openark.org/), and Percona Toolkit (http://www.percona.com/software/).
5- Not all ALTER TABLE operations cause table rebuilds. some changes can be applied directly to the table .frm file (this file has information about table format). example of such an update is changing a default value of a column 

Wednesday, November 4, 2015

High Performance MySQL: CH3 Profile Server Performance


1- the author has mentioned Percona Toolkit’s pt-query-digest tools a lot and he recommends to use it for profiling and analyzing as it gives a lot of good profiling information. other mentioned tool is strace.

2- the author advised that profiling should be enabled all the time, it doesnt have to be enable for all requests you can write something like
<?php
$profiling_enabled = rand(0, 100) > 99;
?>

which means record profiling information for 1% of the requests.

3- the author also mentioned New Relic as a tool for profiling.

Profiling MySQL Queries
1- MySQL comes with a tool for query profiling, It’s the so-called slow query log, the overhead of it is negligible .
2- the author also talked about Percona Server which logs significantly more details to the slow query log than MySQL
3- also sometimes you dont have access to the server that you need to profile, Percona Server is gonna help you in this situation, they have couple of scripts to profile the server even if you dont have access to it (this is done by running SHOW FULL PROCESSLIST repeatedly and get information or capturing TCP network traffic)
4- the author gave an example about using pt-query-digest

Profiling a Single Query
1-MySQL provides us with 4 ways to profile a query, SHOW PROFILE, SHOW STATUS, EXPLAIN, Performance Schema
2- in SHOW PROFILE: Every time you issue a query to the server, it records the profiling information in a temporary table and assigns the statement an integer identifier.
you can run a statement like
mysql> SHOW PROFILE FOR QUERY 1;

to get exact execution plan for the query.

3- in SHOW STATUS: They tell you how often various activities took place (e.g. temp table has been created for 3 times in this query)

4- EXPLAIN: it shows an estimate of what the server thinks it will do, it is not useful ass SHOW STATUS and SHOW PROFILE

4- PERFORMANCE SCHEMA: MySQL 5.5 comes with a PERFORMANCE schema which gives you some performance information. example"
mysql> SELECT event_name, count_star, sum_timer_wait
-> FROM events_waits_summary_global_by_event_name
-> ORDER BY sum_timer_wait DESC LIMIT 5;

Profiling Server Wide Problem:
in the previous section we were talking about profiling a single query, however sometimes the issue is global and you need to profile the whole server.
to do this you can use:

1- SHOW GLOBAL STATUS, it will give you some information like number of running threads, number of connected threads
2- SHOW PROCESSLIST: also gives you information about the threads.

Capturing Diagnostic Data
he is talking about how you can capture the diagnostic data, the idea is to wait for a trigger (like a peek in the number of running threads) then when the trigger happens start gathering data.

he mentioned alot of tools that can be used to help you in doing that, all the tools are part of Percona toolkit.

there is a case study as well.

High Performance MySQL: CH2 Benchmarking


Chapter 2 

Capturing System Performance and Status
here the author gave a script that you can use to capture the system performance


notice the use of some important SQL statements "SHOW GLOBAL STATUS ", "SHOW ENGINE INNODB STATUS", "SHOW FULL PROCESSLIST", "SHOW GLOBAL VARIABLES"

and as you can see this scripts run every 5 seconds "INTERVAL =5"

Some Tools for Benchmarking:
1- mysqlslap (http://dev.mysql.com/doc/refman/5.1/en/mysqlslap.html)
2- Percona’s TPCC-MySQL Tool
3- MySQL Benchmark Suite (sql-bench)

the author gave examples how to use these tools.


Tuesday, November 3, 2015

High Performance MySQL: Ch1 MySQL Architecture and History

MySQL’s Logical Architecture


MySQL Server Architecture.

from the top, we have services for clients to connect to MySQL, services like connection handling, authentication and security,

What is inside the box is the brain of MySQL, here we have query parsing, analysis, optimization, caching all the built-in functions (e.g., dates, times, math, and encryption) stored procedures, triggers, and views

The third layer contains the storage engines. They are responsible for storing and retrieving all data stored “in” MySQL (we have different engines like MyISAM, InnoDB). The storage engines don’t parse SQL or communicate with each other; they simply respond to requests from the second layer.

Connection Management
1- Each client connection gets its own thread within the server process.
2- The connection’s queries execute within that single thread, which in turn resides on one core or CPU.
3- The server caches threads, so they don’t need to be created and destroyed for each new connection

Optimization and Execution
1- MySQL parses queries to create an internal structure (the parse tree), and then applies a variety of optimizations. These can include rewriting the query, determining the order in which it will read tables, choosing which indexes to use, and so on.
2- You can pass hints to the optimizer through special keywords in the query, affecting its decision making process.
3-You can also ask the server to explain various aspects of optimization. This lets you know what decisions the server is making and gives you a reference point for reworking queries, schemas, and settings to make everything run as efficiently as possible.
4- Before parsing the query, the server consults the query cache, which can store only SELECT statements, along with their result sets. If anyone issues a query that’s identical to one already in the cache, the server doesn’t need to parse, optimize, or execute the query at all—it can simply pass back the stored result set.

Concurrency Control
1- Concurrency is controlled by locks, These locks are usually known as shared locks and exclusive
locks, or read locks and write locks.
2- Read locks on a resource are shared, or mutually nonblocking: many clients can read from a resource at the same time and not interfere with each other.
3- Write locks, on the other hand, are exclusive—i.e., they block both read locks and other write
locks
4- locks consume resources.
5- In mySQL we have Table locks and Row locks

Isolation Level

READ_UNCOMMITTED: a transaction may read data that is still uncommitted by other transactions which may lead to dirty reads.


READ_COMMITTED: a transaction can't read data that is not yet committed by other transactions. This fixes dirty read, but may lead to nonrepeatable read. (read different value in the course of a transaction).



REPEATABLE_READ: if a transaction reads one record from the database multiple times the result of all those reading operations must always be the same. This eliminates both the dirty read and the non-repeatable read issues, however it creates a Phantom Read issue (a transaction fetched a range of records multiple times from the database and obtained different result sets).



SERIALIZABLE: the most restrictive of all isolation levels. Transactions are executed with locking at all levels (read, range and write locking) so they appear as if they were executed in a serialized way.



Deadlock
Transaction #1
START TRANSACTION;
UPDATE StockPrice SET close = 45.50 WHERE stock_id = 4 and date = '2002-05-01';
UPDATE StockPrice SET close = 19.80 WHERE stock_id = 3 and date = '2002-05-02';
COMMIT;
Transaction #2
START TRANSACTION;
UPDATE StockPrice SET high = 20.12 WHERE stock_id = 3 and date = '2002-05-02';
UPDATE StockPrice SET high = 47.20 WHERE stock_id = 4 and date = '2002-05-01';
COMMIT;

If each transaction executes its first query and update a row of data, locking it in the process. Each transaction will then attempt to update its second row, only to find that it is already locked. The two transactions will wait forever for each other to complete, unless something intervenes to break the deadlock.

handling DeadLock is differ between storage engines.
1- InnoDB storage engine, will notice circular dependencies and return an error instantly, Others will give up after the query exceeds a lock wait timeout,


Transaction Logging
1- Instead of updating the tables on disk each time a change occurs, the storage engine can change its in-memory copy of the data. This is very fast. 
2- The storage engine can then write a record of the change to the transaction log, which is on disk and therefore durable. This is also a relatively fast operation, because appending log events involves sequential I/O in one small area of the disk instead of random I/O in many places. 
3- Then, at some later time, a process can update the table on disk.
4- this is called (write-ahead logging) which actually ends up writing the changes to disk twice.
5- If there’s a crash after the update is written to the transaction log but before the changes are made to the data itself, the storage engine can still recover the changes upon restart. 


Transactions in MySQL
1- MySQL provides two transactional storage engines: InnoDB and NDB Cluster. Several third-party engines are also available; the best-known engines right now are XtraDB and PBXT

2- MySQL operates in AUTOCOMMIT mode by default, which means it automatically executes each query in a separate transaction

Multiversion Concurrency Control
1- Most of MySQL’s transactional storage engines don’t use a simple row-locking mechanism. Instead, they use row-level locking in conjunction with a technique for increasing concurrency known as multiversion concurrency control (MVCC)
2- MVCC has less overhead
3- MVCC has 2 types optimistic and pessimistic
4- Optimistic Locking: when you read a record, take note of a version number) and check that the version hasn't changed before you write the record back.
5- Pessimistic Locking: is when you lock the record for your exclusive use until you have finished with it. 
6- In Hibernate, Optimistic Locking is obtained by using @Version.
7- MVCC works only with the REPEATABLE READ and READ COMMITTED isolation levels

MySQL’s Storage Engines
1- InnoDB is transactional storage engine
2- MyISAM is not transactional
3- MyISAM provides more types of indexes like spatial (GIS)
4- Never go with MyISAM
5- XtraDB is another storage engine for high transaction systems