Sunday, March 31, 2013

I'd rather not always be compacting

I'd rather not always be compacting: I read a post about the need to defragment MongoDB databases. I liked the title but prefer to avoid a DBMS that requires routine maintenance. Alas I don't know what the author means by routine. There is a big difference between having to defragment a database once a week versus once a quarter. This isn't an online operation per the manual so it can have a big impact on service availability.

PostgreSQL isn't immune to this problem. It has a long history with vacuum and each release makes things better. But I haven't run either PostgreSQL or MongoDB in production for a busy OLTP workload so I don't know whether these would be a serious problem. Context matters in this case. I can read many things on the internet that tell me MongoDB compaction and PostgrSQL vacuum are never an issue or are always an issue. The hard part is figuring out whether that is true for the workload that you care about.

I do have experience with InnoDB. InnoDB purge has some things in common with PostgreSQL vacuum. It cleans up old versions required by MVCC when the old versions are no longer visible. For InnoDB this means it physically removes rows that were delete marked. Performance can slow to a crawl when purge gets too far behind and there are a few things that could be done to control that, but purge has never been routine maintenance. It just runs in the background. The controls for it are:
Purge != defragmentation

Even though purge is online InnoDB can still require maintenance. On servers that I care about 2/3 of the data growth rate was caused by fragmentation versus 1/3 from new data. Defragmentation is useful to reclaim the space wasted by fragmentation, especially when using pure-flash storage. The frequency in this case is a few times per year.

How do you defragment servers in production? If you are fortunate to have several replicas per master and spare capacity, then you can defragment a replica and then replace the master. The other options include using online DDL that is new in MySQL 5.6 to defragment secondary indexes. To defragment a primary index you probably need an online schema change framework like pt-online-schema change, oak-online-alter-table or Facebook OSC.

I think there is a better way to defragment InnoDB that is incremental and less invasive. But this is only a theory today. One of my talented peers is trying to figure this out and we hope to have good news and code to share in a few months.

DIGITAL JUICE

No comments:

Post a Comment

Thank's!