« stresstest » Feed

6c4a91744a9f857f00ddd8802edb7465

Partitioning MySQL database with high load solutions

2010-11-19 00:35

Write comment

We need to test if MySQL can be reliable with huge data sets, talking of a daily waterfall of about 30 million INSERTs and a few daily analityc big SELECTs too.
So what's the deal ? which database engine ? which platform ? which architecture ? ... and is data secured ? accessible ?
Before choosing MySQL 5.1, i turned around these three databases:

  • Oracle :
    Known as the 'most reliable' database engine in the world, personally i don't believe in that, and i didn't choose it for its cost (Especially that :) ).
  • PostgreSQL :
    A good database engine, it succeeded where older than 5.0 versions of MySQL failed, but now, both engines are mature enough and i'm switching between them on every new projects depending on the specific needs of each project architectures.
    Switching between database engines should be a simple task when the application deploys an abstraction layer on the database.
    So why not PostgreSQL for this project ?
    1. PostgreSQL (the latest stable version to this day) still lacks on partitioning and provide very basic features compared to what MySQL do.
    2. As i know, PostgreSQL has no 'enterprise' version and no paid support, with mission critical projects, we cant rely on the community respondness to resolve issues, we need enterprise-level SLA.
  • MySQL 5.1
    Refering to the new features list of the 5.1 version of MySQL, i felt in love with it.
    Good partitioning (and sub-partitioning), better replication engine .. and the 'enterprise' version is making us comfortable with critical-mission applications.
As for scalability, making a database respond in less than 1 second when stressed with up to 2000 INSERT/sec throughput (with TRIGGER) does not rely on System sizing and Database engine only, these things wont scale if the design is poorly done.

Let's dive into the blue:
We have a sell reporting analytic application, we have about 20 million sells per day, each selling action has a Seller and a Buyer as long as the date and the amount of the bill.
The informations are provided in flat text files, so we use Talend ETL to load the data to the database.
On the output side, we need to get a fast access to these informations:
  • (1) Best sellers of each day having total amount of sells more than a defined limit. (Using the sum of the billing amounts of the day)
  • (2) Occasionally fetch selling details of a Seller for a specific period.
I've deployed two tables, the first is for delivering best sellers/buyers informations (1), the second is for the selling details (2).
Here's the first one:

CREATE TABLE `transaction` (
`seller` int(11) DEFAULT NULL,
`buyer` int(11) DEFAULT NULL,
`amount` decimal(5,2) DEFAULT 0,
`sell_date` date DEFAULT NULL,
`sell_time` time DEFAULT NULL,
PRIMARY KEY(`seller`, `buyer`, `sell_date`, `time`)
) ENGINE=InnoDB
PARTITION BY RANGE ( MONTH(`sell_date`))
SUBPARTITION BY HASH ( `seller` div 1000000)
SUBPARTITIONS 10
(
PARTITION p1 VALUES LESS THAN (1) ENGINE = InnoDB,
PARTITION p2 VALUES LESS THAN (2) ENGINE = InnoDB,
PARTITION p3 VALUES LESS THAN (3) ENGINE = InnoDB,
PARTITION p4 VALUES LESS THAN (4) ENGINE = InnoDB,
PARTITION p5 VALUES LESS THAN (5) ENGINE = InnoDB,
PARTITION p6 VALUES LESS THAN (6) ENGINE = InnoDB,
PARTITION p7 VALUES LESS THAN (7) ENGINE = InnoDB,
PARTITION p8 VALUES LESS THAN (8) ENGINE = InnoDB,
PARTITION p9 VALUES LESS THAN (9) ENGINE = InnoDB,
PARTITION p10 VALUES LESS THAN (10) ENGINE = InnoDB,
PARTITION p11 VALUES LESS THAN (11) ENGINE = InnoDB,
PARTITION p12 VALUES LESS THAN MAXVALUE ENGINE = InnoDB
);

This table will contain the sell transactions, I've made partitioning by seller and month of the transaction in the way to facilitate queries for selling details for a specific seller and period: (2).

Here's the second one (dedicated to queries fetching best sellers of each day (1)):

Read more…

 Nbre d'élements: 1/
1