Thursday, November 5, 2015

High Performance MySQL: CH4 Optimizing Schema and Data Types

1- It’s harder for MySQL to optimize queries that refer to nullable columns, because they make indexes, index statistics, and value comparisons more complicated. A nullable column uses more storage space and requires special processing inside MySQL. When a nullable column is indexed, it requires an extra byte per entry.

2- The performance improvement from changing NULL columns to NOT NULL is usually small, so don’t make it a priority

Number types
3- TINYINT, SMALLINT, MEDIUMINT, INT, or BIGINT.
4- MySQL lets you specify a “width” for integer types, such as INT(11). This is meaningless
for most applications: it does not restrict the legal range of values, but simply specifies the number of characters MySQL’s interactive tools (such as the command-line client) will reserve for display purposes. For storage and computational purposes, INT(1) is identical to INT(10)

Real Numbers
1- we have FLOAT, DOUBLE and DECIMAL,
2- DECIMAL is for storing exact fractional numbers (so accurate),
3- DECIMAL is slow as the server itself performs DECIMAL math in MySQL 5.0 and newer, because CPUs don’t support the computations directly. Floating-point math is significantly faster, because the CPU performs the computations natively.

String Types
1- we have CHAR and VARCHAR
2- VARCHAR uses extra bytes to store the value's length

BLOB and TEXT types
1- we have types:TINYTEXT, SMALL TEXT, TEXT, MEDIUMTEXT, and LONGTEXT (TEXT is synonym for SMALLTEXT)
2- we have TINYBLOB, SMALLBLOB, BLOB, MEDIUMBLOB, and LONGBLOB. BLOB is a synonym for SMALLBLOB,
3- The only difference between the BLOB and TEXT families is that BLOB types store binary data with no collation or character set, but TEXT types have a character set and collation
4- Memory storage engine doesn’t support the BLOB and TEXT types, queries that use BLOB or TEXT columns and need an implicit temporary table ( he mentioned a trick to put them in Memory which is using SUBSTRING, check page 122 ).

Using ENUM
1- An ENUM column can store a predefined set of distinct string values.
2- ENUM list is fixed if you want to update you should perform ALERT TABLE request.


Date and Time Types
1- DATETIME: stored like this YYYYMMDDHHMMSS
2- TIMESTAMP type stores the number of seconds elapsed since midnight
3- TIMESTAMP uses only four bytes of storage, so it has a much smaller range than DATETIME
4- TIMESTAMP is timezone aware, if you store or access data from multiple time zones, the behavior of TIMESTAMP and DATETIME will be very different.
5- MySQL currently does not have an appropriate data type for this, but you can use your own storage format

Bit-Packed Data Types
1- we have BIT, SET

General Notes
1- be very careful with completely “random” strings, such as those produced by MD5(), SHA1(), or UUID(). Each new value you generate with them will be distributed in arbitrary ways over a large space, which can slow INSERT and some types of SELECT queries

2- Object-relational mapping (ORM) systems (and the “frameworks” that use them) are another frequent performance nightmare.

3- People often use VARCHAR(15) columns to store IP addresses. However, they are really unsigned 32-bit integers, not strings.

4- ENUM has some performance issues.

5- try not to use NULL, maybe you can use "NO VALUE" or -1 or ...

6- if you have a table with counter value, you may have concurrency issue. You can get higher concurrency by keeping more than one row and updating a random row.

Speeding Up ALTER TABLE
1- In general, most ALTER TABLE operations will cause interruption of service in MySQL.
2- MySQL performs most alterations by making an empty table with the desired new structure, inserting all the data from the old table into the new one, and deleting the old table. This can take a very long time,
3- MySQL 5.1 and newer include support for some types of “online” operations that won’t lock the table for the whole operation. Recent versions of InnoDB also support building indexes by sorting, which makes building indexes much faster and results in a compact index layout.
4- you can use some tools for altering table to swap to a temp server then come back, some of these tools are two. the “online schema change” tools from Facebook’s database operations team (https://launchpad.net/mysqlatfacebook),Shlomi Noach’s openark toolkit (http://code.openark.org/), and Percona Toolkit (http://www.percona.com/software/).
5- Not all ALTER TABLE operations cause table rebuilds. some changes can be applied directly to the table .frm file (this file has information about table format). example of such an update is changing a default value of a column 

Wednesday, November 4, 2015

High Performance MySQL: CH3 Profile Server Performance


1- the author has mentioned Percona Toolkit’s pt-query-digest tools a lot and he recommends to use it for profiling and analyzing as it gives a lot of good profiling information. other mentioned tool is strace.

2- the author advised that profiling should be enabled all the time, it doesnt have to be enable for all requests you can write something like
<?php
$profiling_enabled = rand(0, 100) > 99;
?>

which means record profiling information for 1% of the requests.

3- the author also mentioned New Relic as a tool for profiling.

Profiling MySQL Queries
1- MySQL comes with a tool for query profiling, It’s the so-called slow query log, the overhead of it is negligible .
2- the author also talked about Percona Server which logs significantly more details to the slow query log than MySQL
3- also sometimes you dont have access to the server that you need to profile, Percona Server is gonna help you in this situation, they have couple of scripts to profile the server even if you dont have access to it (this is done by running SHOW FULL PROCESSLIST repeatedly and get information or capturing TCP network traffic)
4- the author gave an example about using pt-query-digest

Profiling a Single Query
1-MySQL provides us with 4 ways to profile a query, SHOW PROFILE, SHOW STATUS, EXPLAIN, Performance Schema
2- in SHOW PROFILE: Every time you issue a query to the server, it records the profiling information in a temporary table and assigns the statement an integer identifier.
you can run a statement like
mysql> SHOW PROFILE FOR QUERY 1;

to get exact execution plan for the query.

3- in SHOW STATUS: They tell you how often various activities took place (e.g. temp table has been created for 3 times in this query)

4- EXPLAIN: it shows an estimate of what the server thinks it will do, it is not useful ass SHOW STATUS and SHOW PROFILE

4- PERFORMANCE SCHEMA: MySQL 5.5 comes with a PERFORMANCE schema which gives you some performance information. example"
mysql> SELECT event_name, count_star, sum_timer_wait
-> FROM events_waits_summary_global_by_event_name
-> ORDER BY sum_timer_wait DESC LIMIT 5;

Profiling Server Wide Problem:
in the previous section we were talking about profiling a single query, however sometimes the issue is global and you need to profile the whole server.
to do this you can use:

1- SHOW GLOBAL STATUS, it will give you some information like number of running threads, number of connected threads
2- SHOW PROCESSLIST: also gives you information about the threads.

Capturing Diagnostic Data
he is talking about how you can capture the diagnostic data, the idea is to wait for a trigger (like a peek in the number of running threads) then when the trigger happens start gathering data.

he mentioned alot of tools that can be used to help you in doing that, all the tools are part of Percona toolkit.

there is a case study as well.

High Performance MySQL: CH2 Benchmarking


Chapter 2 

Capturing System Performance and Status
here the author gave a script that you can use to capture the system performance


notice the use of some important SQL statements "SHOW GLOBAL STATUS ", "SHOW ENGINE INNODB STATUS", "SHOW FULL PROCESSLIST", "SHOW GLOBAL VARIABLES"

and as you can see this scripts run every 5 seconds "INTERVAL =5"

Some Tools for Benchmarking:
1- mysqlslap (http://dev.mysql.com/doc/refman/5.1/en/mysqlslap.html)
2- Percona’s TPCC-MySQL Tool
3- MySQL Benchmark Suite (sql-bench)

the author gave examples how to use these tools.


Tuesday, November 3, 2015

High Performance MySQL: Ch1 MySQL Architecture and History

MySQL’s Logical Architecture


MySQL Server Architecture.

from the top, we have services for clients to connect to MySQL, services like connection handling, authentication and security,

What is inside the box is the brain of MySQL, here we have query parsing, analysis, optimization, caching all the built-in functions (e.g., dates, times, math, and encryption) stored procedures, triggers, and views

The third layer contains the storage engines. They are responsible for storing and retrieving all data stored “in” MySQL (we have different engines like MyISAM, InnoDB). The storage engines don’t parse SQL or communicate with each other; they simply respond to requests from the second layer.

Connection Management
1- Each client connection gets its own thread within the server process.
2- The connection’s queries execute within that single thread, which in turn resides on one core or CPU.
3- The server caches threads, so they don’t need to be created and destroyed for each new connection

Optimization and Execution
1- MySQL parses queries to create an internal structure (the parse tree), and then applies a variety of optimizations. These can include rewriting the query, determining the order in which it will read tables, choosing which indexes to use, and so on.
2- You can pass hints to the optimizer through special keywords in the query, affecting its decision making process.
3-You can also ask the server to explain various aspects of optimization. This lets you know what decisions the server is making and gives you a reference point for reworking queries, schemas, and settings to make everything run as efficiently as possible.
4- Before parsing the query, the server consults the query cache, which can store only SELECT statements, along with their result sets. If anyone issues a query that’s identical to one already in the cache, the server doesn’t need to parse, optimize, or execute the query at all—it can simply pass back the stored result set.

Concurrency Control
1- Concurrency is controlled by locks, These locks are usually known as shared locks and exclusive
locks, or read locks and write locks.
2- Read locks on a resource are shared, or mutually nonblocking: many clients can read from a resource at the same time and not interfere with each other.
3- Write locks, on the other hand, are exclusive—i.e., they block both read locks and other write
locks
4- locks consume resources.
5- In mySQL we have Table locks and Row locks

Isolation Level

READ_UNCOMMITTED: a transaction may read data that is still uncommitted by other transactions which may lead to dirty reads.


READ_COMMITTED: a transaction can't read data that is not yet committed by other transactions. This fixes dirty read, but may lead to nonrepeatable read. (read different value in the course of a transaction).



REPEATABLE_READ: if a transaction reads one record from the database multiple times the result of all those reading operations must always be the same. This eliminates both the dirty read and the non-repeatable read issues, however it creates a Phantom Read issue (a transaction fetched a range of records multiple times from the database and obtained different result sets).



SERIALIZABLE: the most restrictive of all isolation levels. Transactions are executed with locking at all levels (read, range and write locking) so they appear as if they were executed in a serialized way.



Deadlock
Transaction #1
START TRANSACTION;
UPDATE StockPrice SET close = 45.50 WHERE stock_id = 4 and date = '2002-05-01';
UPDATE StockPrice SET close = 19.80 WHERE stock_id = 3 and date = '2002-05-02';
COMMIT;
Transaction #2
START TRANSACTION;
UPDATE StockPrice SET high = 20.12 WHERE stock_id = 3 and date = '2002-05-02';
UPDATE StockPrice SET high = 47.20 WHERE stock_id = 4 and date = '2002-05-01';
COMMIT;

If each transaction executes its first query and update a row of data, locking it in the process. Each transaction will then attempt to update its second row, only to find that it is already locked. The two transactions will wait forever for each other to complete, unless something intervenes to break the deadlock.

handling DeadLock is differ between storage engines.
1- InnoDB storage engine, will notice circular dependencies and return an error instantly, Others will give up after the query exceeds a lock wait timeout,


Transaction Logging
1- Instead of updating the tables on disk each time a change occurs, the storage engine can change its in-memory copy of the data. This is very fast. 
2- The storage engine can then write a record of the change to the transaction log, which is on disk and therefore durable. This is also a relatively fast operation, because appending log events involves sequential I/O in one small area of the disk instead of random I/O in many places. 
3- Then, at some later time, a process can update the table on disk.
4- this is called (write-ahead logging) which actually ends up writing the changes to disk twice.
5- If there’s a crash after the update is written to the transaction log but before the changes are made to the data itself, the storage engine can still recover the changes upon restart. 


Transactions in MySQL
1- MySQL provides two transactional storage engines: InnoDB and NDB Cluster. Several third-party engines are also available; the best-known engines right now are XtraDB and PBXT

2- MySQL operates in AUTOCOMMIT mode by default, which means it automatically executes each query in a separate transaction

Multiversion Concurrency Control
1- Most of MySQL’s transactional storage engines don’t use a simple row-locking mechanism. Instead, they use row-level locking in conjunction with a technique for increasing concurrency known as multiversion concurrency control (MVCC)
2- MVCC has less overhead
3- MVCC has 2 types optimistic and pessimistic
4- Optimistic Locking: when you read a record, take note of a version number) and check that the version hasn't changed before you write the record back.
5- Pessimistic Locking: is when you lock the record for your exclusive use until you have finished with it. 
6- In Hibernate, Optimistic Locking is obtained by using @Version.
7- MVCC works only with the REPEATABLE READ and READ COMMITTED isolation levels

MySQL’s Storage Engines
1- InnoDB is transactional storage engine
2- MyISAM is not transactional
3- MyISAM provides more types of indexes like spatial (GIS)
4- Never go with MyISAM
5- XtraDB is another storage engine for high transaction systems 






Saturday, August 1, 2015

High Performance Web Sites

Chapter A, The Importance of Frontend Performance

"Only 10–20% of the end user response time is spent downloading the HTML document. The other 80–90% is spent downloading all the components in the page"

"This book offers precise guidelines for reducing that 80–90% of end user response time"

Chapter B, HTTP overview

1- Compression:
  • HTTP Request: Accept-Encoding: gzip,deflate
  • HTTP Response: Content-Encoding: gzip
2- Conditional Get Request: 
  • If the browser has a copy of the component in its cache, but isn’t sure whether it’s still valid, a conditional GET request is made. ( basically the browser is not sure if it is still valid because the component doesn't have Expires header)
  • the browser sends a GET request with "If-Modified-Since: Wed, 22 Feb 2006 04:15:54 GMT", the server responds "304 Not Modified" with "Last-Modified: Wed, 22 Feb 2006 04:15:54 GMT" header.
  • if the content has modified, the server responds "200 OK with content".
  • ETagIf-None-Match headers are other ways for Conditional GET, will talk about them later.
3- Expires:
  • as mentioned before, the browser sends Conditional GET because the component doesn't have Expires header, add Expires header to save this round trip.
4- Keep-Alive:
  • each HTTP request required opening a new socket connection (too much)
  •  browsers can make multiple requests over a single connection by using Connection: keep-alive header. The server also responds with Connection: keep-alive.
  • The browser or server can close the connection by sending a Connection: close header.
  • in old browsers, a browser sends a request, wait for the response and then sends another request, Pipelining has been defined in HTTP/1.1 which allows for sending multiple requests over a single socket without waiting for a response. ( better performance, not supported in old Browsers.

Chapter 1, Rule 1: Make Fewer HTTP Requests

Make fewer HTTP request by:

1- Image Maps: 

There are drawbacks to using image maps.Defining the area coordinates of the image map, if done manually, is tedious and error-prone

2- CSS Sprites: 
CSS sprites combine multiple images into a single image, an example of such an image:

To use an image from the sprite:


3- Inline Images:

<IMG ALT="Red Star"
SRC="
lvrKy/FvcPewsO9VVfajo+w6O/zl5estLv/8/AAAAAAAAAAAAAAAACH5BAEA
AAsALAAAAAAMAAwAAAQzcElZyryTEHyTUgknHd9xGV+qKsYirKkwDYiKDBia
tt2H1KBLQRFIJAIKywRgmhwAIlEEADs=">

URL scheme is not supported in Internet Explorer (up to and including version 7), there is a limitation on the size, sure it is not cached.

4- Combined Scripts and Stylesheets
Rather than having multiple CSS files, combine them in one file, same thing for Scripts files. Sure the idea is against writing a modular code, combine these files in the build process.


Chapter 2, Rule 2: Use a Content Delivery Network

That is it, use CDN to improve static content delivery, the writer hasn't talked much about hosting Dynamic content on CDN.
One experience i had before with Akamai was to host content that were available only for Authorized & Authenticated users, and actually helped us caching this type of content efficiently; Here is an article that talk about that
http://www.akamai.co.jp/enja/dl/feature_sheets/FS_edgesuite_accesscontrol.pdf 


Chapter 3, Rule 3: Add an Expires Header

  1. Use the Expires header so the browser doesnt have to go to the server to fetch unexpired content
    Expires: Thu, 15 Apr 2020 20:00:00 GMT
  2. Because the Expires header uses a specific date, it has stricter clock synchronization requirements between server and client, that is why a new header has introduced
  3. Cache-Control:max-age has introduced to solve this limitation which take the expiration value in seconds, Cache-Control: max-age=315360000
  4. If both, Expires and Cache-Control max-age, are present, the HTTP specification dictates that the max-age directive will override the Expires header
  5. If we configure components to be cached by browsers and proxies, how do users get updates when those components change? To ensure users get the latest version of a component, change the component’s filename in all of your HTML pages. (another solution is to add a query string with a version number xxx.js?v=123 and update the version, however i found that developers complain that browsers sometimes ignore the query string when it comes to caching, so the safest option is to update the file name)

CHAPTER 4,Rule 4: Gzip Components

Use compression to reduce the size of the response
  • Client sends: Accept-Encoding: gzip, deflate
  • Server compresses the response using one of the accepted methods and reply
    Content-Encoding: gzip
  • There is a cost to gzipping: it takes additional CPU cycles on the server to carry out the compression and on the client to decompress the gzipped file
  • Image and PDF files should not be gzipped because they are already compressed.
  • Generally, it’s worth gzipping any file greater than 1 or 2K
  • Apache 1.3 uses mod_gzip for compressing while Apache 2.x uses mod_deflate
Proxy Caching and Compressing
Imagine the following scenario:
  1. The first request to the proxy for a certain URL comes from a browser that does not support gzip ( so the request doesn't have Accept-Encoding: gzip, deflate ).
  2. the proxy cache is empty
  3. The proxy forwards that request to the web server
  4. The web server’s response is uncompressed ( because the request doesn't have Accept-Encoding: gzip, deflate ).
  5. The response will be cached by the proxy and sent on to the browser
  6. Now, suppose the second request to the proxy for the same URL comes from a browser that does support gzip
  7. The proxy responds with the (uncompressed) contents in its cache, the second request missed  the opportunity to get compressed content.
Now imagine this scenario, the first request is from a browser that supports gzip and the second request is from a browser that doesn’t. In this case, the proxy has a compressed version of the contents in its cache and serves that version to all subsequent browsers whether they support gzip or not.

To solve this problem:
  1. the Web server should tell the Proxy server to save multiple cached responses of the same URL. This happens by using the Vary header in the response (e.g. Vary: Accept-Encoding), this causes the proxy to cache multiple versions of the response, one for each value of the Accept-Encoding request header.
  2. You can prevent Proxy server from keeping a cached copy by setting Cache-Control: private in the response.






CHAPTER 5, Rule 5: Put Stylesheets at the Top

  1. we want the browser to display whatever content it has as soon as possible (load progressively)
  2. putting stylesheets near the bottom of the document prohibits progressive rendering in many browsers as components are (in general) downloaded in the order in which they appear in the document.
  3. Browsers block rendering to avoid having to redraw elements of the page if their styles change (which means, browsers wait until all stylesheets are loaded before calculating the style of the loaded elments).
So basically the Rule is : Put your stylesheets in the document HEAD using the LINK tag.


CHAPTER 6, Rule 6: Put Scripts at the Bottom

  1. Experiment: put a script in the middle of a page, programm the script to take 10 seconds to load, you will notice the problem which is the bottom half of the page takes about 10 seconds to appear.
  2. As we mentioned before, when using stylesheets, progressive rendering is blocked until all stylesheets have been downloaded.That’s why it’s best to move stylesheets to the document HEAD, so they are downloaded first and rendering isn’t blocked. However, with scripts, progressive rendering is blocked for all content below the script.Moving scripts lower in the page means more content is rendered progressively.
Parallel download
  1.  HTTP/1.1 specification suggests that browsers download two components in parallel per hostname, so if you are downloading from the same hostname you will see something like this


    However, if you are downloading from 2 hostnames, you will find 4 parallel download bars,
  2. Note that for HTTP/1.0, Firefox’s default is to download eight components on parrallel, you can change these value in the browser configuration.
  3. To increase parallel downloaded components, use CNAMEs (DNS aliases) to split their components across multiple hostnames.
  4. Too many parallel downloads can degrade performance.Research at Yahoo! shows that splitting components across two hostnames leads to better performance than using 1, 4, or 10 hostnames
Scripts Block Downloads
  1. Parallel downloading is disabled while a script is downloading—the browser won’t start any other downloads, even on different hostnames
  2. This is to guarantee that the scripts are executed in the proper order.If multiple scripts were downloaded in parallel, there’s no guarantee the responses would arrive in the order specified.
  3. Also because the script may use document.write to alter the page content, so the browser waits to make sure the page is laid out appropriately.
So if we put the scripts at the top (this is the worst case):
  1. Content below the script is blocked from rendering.
  2. Components below the script are blocked from being downloaded.
But if we put the scripts at the bottom (this is the best case):
  1. The page contents aren’t blocked from rendering
  2. Viewable components in the page are downloaded as early as possible
That's why we should Move scripts to the bottom of the page.


CHAPTER 7, Rule 7: Avoid CSS Expressions

  1. This rule is only for IE browsers as CSS expressions are not available in other browsers.
  2. Example of such an expression:
    background-color: expression( (new Date()).getHours( )%2 ? "#B8D4FF" : "#F08A00" );
    JavaScritpt is used to write an expression which makes the background color alternates every hour.
  3. CSS expressions are evaluated more than what we expect, they are reevaluated for various events including resize, scrolling, and mouse movements.
  4. This may cause a performance issue.
  5. The author mentioned a way to overcome this issue, but i believe it is better not to use CSS expression at all as the rule says.

CHAPTER 8, Rule 8: Make JavaScript and CSS External

  1. The title advises to make the JS and CSS external not internal. However, this chapter introduces many advice.
  2. In general, when the JS and CSS are external, you will get the benefit of browser's cache. On the other side, inline JS and CSS will be loaded faster if there is no cache in the browser (i.e. first visit to page).
  3. Think about combining all CSS in one file and all JS in one file.  This has the benefit of subjecting the user to only one HTTP request, but it increases the amount of data downloaded on a user’s first page view.
  4. Think about categorizing your pages into a handful of page types and then create a single script and stylesheet for each one.
  5. some websites' homepages are not used frequently, so having JS and CSS embedded internally could be a good idea ( remember that the browser deletes the long time unused cached content even if it is not expired).
Post-Onload download
  1. In some critical pages, you can embed the JS and CSS files internally, and add a javascript function to download them. 
  2. By that, the first page access will be served by the internal JS and CSS, the subsequent access will be served by the cached downloaded JS and CSS.
  3. there is an example of such a function in the book.

Dynamic Inlining
  1. Another idea, as we don't know what is stored in the browser cache, we can use a cookie as an indicator. 
  2. If the cookie is absent, the JavaScript or CSS is inlined. If the cookie is present, it’s likely the external component is in the browser’s cache and external files are used.
  3. On the first page visit, there will be no cookie, JS and CSS will be inlined and the cookie will be set.
  4. In the subsequent requests, the cookie will be there so the JS and CSS will be rendered as external links and will be served by the browser cache.

CHAPTER 9, Rule 9: Reduce DNS Lookups


  1. DNS resolver has a cost. It typically takes 20–120 milliseconds for the browser to look up the IP address for a given hostname.
  2. The browser can’t download anything from this hostname until the DNS lookup is completed.
  3. DNS lookups are cached in different locations for better performance:
    • on a special caching server maintained by the user’s ISP.
    • local area network.
    • in the operating system’s DNS cache (the “DNS Client service” on Microsoft Windows).
    • browsers own caches.
Factors Affecting DNS Caching
  1. The DNS record returned from a lookup contains a time-to-live (TTL) value. This tells the client how long the record can be cached.
  2. Operating system caches respect the TTL, 
  3. Browsers often ignore it and set their own time limits.
  4. The Keep-Alive feature of the HTTP protocol, can override both the TTL and the browser’s time limit (i.e. as long as the browser and the web server are communicating and keeping their TCP connection open, there’s no reason for a DNS lookup).
  5. Browsers put a limit on the number of DNS records cached (i.e. earlier DNS records are discarded).
  6. If the browser doesn't have a DNS record, the operating system cache will be checked, if it is not there, the local area network or the ISP cache will be checked.
TTL Values
  1. When the browser does a DNS lookup, the DNS resolver returns the amount of time remaining in the TTL for its record. (that is because the DNS entry has already lived for an amount of time in this DNS resolver).
  2. For example, if the maximum TTL is 5 minutes, the TTL returned by the DNS resolver ranges from 1 to 300 seconds.
DNS From OS and Browser’s Perspective
  1. The DNS cache on Microsoft Windows is managed by the DNS Client service
    • to view the cache : ipconfig /displaydns
    • to fulsh: ipconfig /flushdns
    • Rebooting clears the DNS Client service cache
  2. Internet Explorer’s DNS cache is controlled by three registry settings:
    • These settings created in the registry key:
      HKEY_CURRENT_USER\Software\Microsoft\Windows\CurrentVersion\InternetSettings\
    • DnsCacheTimeout: 30 minutes (i.e. if IE received a TTL value less than 30 minutes from the server, it will be ignored).
    • KeepAliveTimeout: 1 minute (i.e.  a persistent TCP connection is used until it has been idle for one minute, during this 1 minute no DNS lookups will be happened).
    • ServerInfoTimeOut: 2 minutes (i.e. even without Keep-Alive, if a hostname is reused every two minutes without failure, a DNS lookup is not required).
  3. Firefox  has the following configuration settings:
    • network.dnsCacheExpiration: 1 minute.
    • network.dnsCacheEntries: 20 (this value is too small).
    • network.http.keep-alive.timeout: 5 minutes.
  4. FasterFox ( Firefox add-on for measuring and improving Firefox performance)
    1. network.dnsCacheExpiration: 1 hour.
    2. network.dnsCacheEntries: 512.
    3. network.http.keep-alive.timeout: 30 seconds.
Notes


  1. Reducing the number of unique hostnames in the page reduces the number of DNS lookups (this is true only if the client DNS cache is empty).
  2. However, reducing the number of unique hostnames has the potential to reduce the amount of parallel downloading.
  3. for a good compromise between reducing DNS lookups and allowing a high degree of parallel downloads, the author suggests to split the components across at least two but no more than four hostnames.
  4. remember that using Keep-Alive reduced DNS look-ups.
CHAPTER 10, Rule 10: Minify JavaScript

Minification
  1. Minification is  removing unnecessary characters from code to reduce its size, thereby improving load times.
  2. When code is minified, all comments are removed, as well as unneeded whitespace characters (space, newline, and tab).
  3. JSMin is good tool for minification.
  4. minification is good for external as well as internal scripts.
  5. sure, you can also minify CSS files.
Obfuscation 
  1. Like minification, it removes comments and whitespace.
  2. It also munges the code, function and variable names are converted into smaller strings making the code more compact, as well as harder to read.
  3. Make the code difficult to reverse-engineer.
  4. Because obfuscation is complex, it makes the code hard to maintain and debug, and it may introduce bugs.
  5. ShrinkSafe is a good tool for obfuscation.
  6. Minification is preferred over obfuscation.


Chapter 11, Rule 11: Avoid Redirects

Types of Redirects

  1.  300 Multiple Choices: 
    • The server has multiple representations of the requested resource.
    • The client didn’t use the Accept-* headers to specify a representation, or it asked for a representation that doesn’t exist.
    • The server can pick its preferred representation, and send it with a 200 (“OK”) status code. or send a 300 response with a list of possible URIs to different representations.
    • If the server has a preferred representation, the server can put the representation URI in the Location header.
    • If the server needs to return a list of representations, the server uses the response body.
  2. 301 Moved Permanently:
    • The server knows which resource the client is trying to access.
    • The server wants to tell the client to stop using this URI to access this resource and use a different URI.
    • the server sends 301 response with the new URI in the Location header. The client should make a note and stop using the old URI.
  3. 303 See Other 
    • Avilable only in HTTP/1.1.
    • 303 means that the request has been processed, but the server will not send a response document. The server will send the client a new URI (in the Location header) which points to a response document.
    • if the client wants to download the response document, they can send a GET request to the new URI. (very important, the client always send a GET request).
    • Example: the client request for http://www.example.com/software/BuildPdfDocument, the server replies with 303 and http://www.example.com/software/DownloadPdfDocument?id=123. Which means the server has built the pdf document and is telling the client that they can download it from the new link if they want.
  4. 307 Temporary Redirect
    • Avialble only in HTTP/1.1
    • 307 means that the request has not been processed, the client should resubmit the request to another URI (very important, if the first request was POST, DELETE, PUT the client should do the same request to the new URI not like 303 where the client should always send a GET request).
  5. 302 Moved Temporally (a.k.a Found)
    • 303 & 307 came to solve the ambiguity of this response.
    • 302 should be used like 307 response.
  6. 304 Not Modified:
    • This is not a redirect.
    • The client asks for a resource with If-Modified-Since header.
    • The server replies 304 if the resource hasn't been modified.
  7. 305 Use Proxy: Not important 
Notes
  1. The 301 and 302 status codes are the ones used most often.
  2. Neither a 301 nor a 302 response is cached in practice unless additional headers, such as Expires or Cache-Control, indicate that it should be.
  3. other redirect mechanism is the HTML meta refresh tag
    <meta http-equiv="refresh" content="0"; url="http://google.com">
  4. JavaScript can be used to redirect users.
  5. it is recommended to use HTTP redirect.
  6. You may have issues with the browser back button if you use JavaScript redirect (window.location.replace vs window.location.assign).
How Redirects Hurt Performance
  1. The author describes the fact that the requested HTML document will not be downloaded until the redirect is done. Moreover, stylesheets and Scripts will not be downloaded until the HTML document is downloaded. If we do too many redirects, the user will not see anything on the screen until the redirect is done.
  2. The author advises to find other ways to solve issues that could be solved by redirection.

CHAPTER 12, Rule 12: Remove Duplicate Scripts

  1. Duplicate scripts hurt performance: unnecessary HTTP requests and wasted JavaScript execution.
  2. Make sure scripts are included only once.
  3. One way could be by using a script to check duplication. Rather than using
    <script type="text/javascript" src="asdf.js"/>
    to include a script, programmers can use.
    <?php insertScript("asdf.js")?>
  4. insertScript will check if asdf.js is inserted before or not, it also check if it has other dependencies so it can insert them.

<?php
function insertScript($jsfile) {
       if ( alreadyInserted($jsfile) ) {
            return;
      }
      pushInserted($jsfile);
      if ( hasDependencies($jsfile) ) {
          $dependencies = getDependencies($jsfile);
          Foreach ($dependencies as $script) {
               insertScript($script);
          }
      }
     echo '<script type="text/javascript" src="' . getVersion($jsfile) . '"></script>";
}
?>


CHAPTER 13, Rule 13: Configure ETags

ETags are a mechanism that web servers and browsers use to validate cached components.


Conditional GET Requests
  1. When a cached component does expire (or the user explicitly reloads the page), the browser can’t reuse it without first checking that it is still valid.
  2. The browser sends a Conditional Get Request to server to check if the cached version is still valid. 
  3. The server will reply “304 Not Modified” if the cached version is still up to date or "200 OK" with the new version of the content if the cached version has been modified.
  4. There are two ways in which the server determines whether the cached component matches the one on the origin server:
    1. By comparing the last-modified date
    2. By comparing the entity tag
Last-Modified Date
  1. The client sends a get request.
  2. The server replies with Last-Modified: Tue, 12 Dec 2006 03:03:59 GMT
  3. When the component is expired, the client sends a get request with If-Modified-Since: Tue, 12 Dec 2006 03:03:59 GMT
  4. The server replies "304 Not Modified", if the content has not been modified.

Entity Tags
  1. ETags were added to provide a more flexible mechanism for validating entities than the last-modified date.If, for example, an entity changes based on permissions, the User-Agent or Accept-Language headers.
  2. The client sends a get request.
  3. The server replies with ETag: "10c24bc-4ab-457e1c1f".
  4. When the component is expired, the client send a get request with If-None-Match: "10c24bc-4ab-457e1c1f".
  5. The server replies 304 Not Modified, if the content has not been modified.
The Problem with ETags
  1. They are constructed using attributes that make them unique to a specific server hosting a site (i.e. in case of a cluster of servers, ETags won’t match when a browser gets the original component from one server and later makes a conditional GET request that goes to a different server). which means unnecessary reloading of components.
  2. Apache adds information like file type, owner, group, and access mode to the ETags.
  3. IIS uses different information.
  4. If both If-None-Match and If-Modified-Since are in the request, the origin server “MUST NOT return a response status of 304 (Not Modified) unless both conditions met.
What to do 
  1. If you have components that have to be validated based on something other than the last-modified date, ETags are a powerful way of doing that, 
  2. In case of a single server website, you can let the web server (e.g. Apache) to generate ETags for you
  3. In case of cluster of servers make sure to configure that ETag header by yourself, dont let the webserver to do that.