Archive for August, 2011
Some Performance Tips for Magento
Monday, August 29th, 2011 | Technology | No Comments
(In this post, I’m talking about Magento 1.5.1 and I had about 500,000+ products to import and maintain. These points are all related to these two facts.)
I’ve spent the better part of two months learning and integrating with Magento eCommerce software. During this time I’ve bumped into several limitations and scaling issues because of the particular problem domain I’m working in but, at the end of it all, there are a few tips and tricks I’ve discovered which have helped. I’ll just present them here in no particular order. Hopefully they help someone else in future.
Avoid text attributes as much as reasonably possible. The text attributes you create for your products correspond to TEXT and/or BLOB data types in the MySQL database. When MySQL processes a query which generates a result set with a TEXT or BLOB column, it doesn’t use in-memory temporary tables because the MEMORY engine does not support those column types. These temporary tables are created on disk. This increased disk usage obviously results in increased disk IO which slows down everything on your server. So if you have significant amounts of Text attributes, you should make sure you have some quality high-speed disk systems or specify that temporary tables be created on a virtual file system backed up by memory. (Is ramdisk still the correct term? Am I getting old?)
Implement your own RPC interface to speed up the SOAP or XML-RPC API interface. Why? Because it’s sloooow. My problem involved importing about 500,000 products into the Magento system and it was decided to use the API provided. Unfortunately, it took a long long time but that provided me with a lot of time to learn about Magento in my attempts to speed it up. Technically, you have two choices here: 1) delve straight into the database and build queries yourself; 2) Implement a small wrapper about the API calls to replace the existing API calls.
Choice #2 is the obvious one since it’s maintainable. What I did was to implement a small XML-RPC server in PHP with the same method signatures as the Magento API for the calls I needed (catalog_product.create, etc). My little stubs then imported the Mage system, instantiated an API class (Mage_Catalog_Model_Product_Api) and called the existing Magento logic for product creation. Rather oddly, this sped things up dramatically.
Keep as little data within Magento as possible. Keep enough to display your products the way you want them to be displayed but no more. The Magento database schema is a bit unexpected; it has products split into several attribute tables based on the data type of the attribute. So there’s an integer table for integer attributes, a varchar table for varchar attributes, etc. This seems reasonable but it has a performance impact when your physical hardware can’t back it up. If you don’t have sufficient memory and quality disk systems, a single product lookup can result in an awful lot of disk access. Searches through all products can become impossible when you tend towards large amounts of data. So try to keep it down to a minimum or throw lots of hardware at the problem.
Eventually, our data was kept in a separate database designed specifically for that purpose and our internal systems queried and maintained that as the authoritative source – Magento served as the customer front-end and relative data was pushed to it when required. Much better for long-term possibilities and plans.
Allocate as much memory as you have data and then some. The storage engine used by Magento is InnoDB. InnoDB needs a lot of memory. If you don’t have a lot of memory then disk access once again becomes and issue and your system starts becoming unusable. Make sure you have at least as much memory as you have data within your database. Allocate it all properly to the InnoDB engine. Use helper scripts like mysqltuner.pl to diagnose problems.
Uploaded files and save locations. Not sure why but I guess the makers behing Magento assumed there would be a limited amount of images per product. So they used a 2-layer/depth file system for uploaded images. If your product image is called “myimage.jpg”, it would be uploaded to the “m/y/myimage.jpg” location beneath your media root. Each of the products in my local Magento could have 3 images and that totaled a possible 1.5 million files at least. If there was no file naming bias in the product image files, then you would be looking at directories with maybe 1K files. However, if your image files are all coded according to a logical scheme – as mine are – then there’s a significant bias and Magento’s 2-depth system results in several thousand files in a single directory. I manually increased this to a 3-depth layer system by modifying the getDispretionPath function in (ROOT)/lib/Varien/File/Uploader.php to bring the count back down to sanity. Ideally, though, Magento makers should fix this and allocate directory trees based on a random number Magento generates (or hash, sha1, whatever). It won’t be easy to find your files again but it will allow Magento to scale a bit better.
Keep your database as close as possible to your Magento installation. When it became obvious that Magento needed a lot of memory and power at my required usage levels, we tried to use Amazon RDS – which is really awesome and worked well – but it didn’t work out. Magento actively uses the database and each RPC call or operation performs several operations on the database. So the latency between the server and Amazon was suddenly the bottleneck. So keep your data close on a high-spec’d machine.
Is Magento scalable? If you have the money to throw at hardware, I’m sure it will scale okay-ishly. I think it’ll scale vertically to a point. I believe the caching system they have was probably a response to the slow speed of the database system they built. I believe the caching is there to hide the speed issues when you have a lot of products. (I suppose that’s what all caching is for?)
Do I recommend Magento?. Yes. My problems were just due to my large number of products and limited hardware due to budget constraints. If you don’t have both of those things, then I’m sure you’ll be fine.