Faq out of memory error when analyzing data

From Eigenvector Research Documentation Wiki
Jump to navigation Jump to search

Issue:

I keep getting "out of memory" errors when analyzing my data. What can I do?

Possible Solutions:

The size of data analyzable in PLS_Toolbox or Solo is only limited by the memory (Random Access Memory, RAM) on your computer. The first suggestion for handling out of memory errors is to identify the amount of memory on your computer and figure out if it is sufficient for the data you are analyzing. On a standard 32-bit system (usually limited to 2-3GB of memory), you should be able to analyze data with 1 million elements (1000 samples x 1000 variables) without a problem. Even data up to 36 million elements (6000 samples x 6000 variables) should be analyzable, although some care must be taken (see 32-bit hints below).

The real solution for analyzing large data sets is to get as much RAM on your computer as possible, and this usually means doing the analysis on a 64-bit system with 6GB or more of memory.

Suggestion #1:
Use a 64-bit system. This is really the only viable solution for serious large-data analysis. 64-bit computers with 8 GB or more of memory can be purchased very inexpensively. Such systems can easily run analyses of hundreds of millions of elements without having to take memory-saving precautions outlined below.

Suggestion #2:
If you have only a 32-bit system, have EVERYTHING else closed except Matlab or Solo and follow the following guidelines (note, several of these mention images as this type of data is one of the biggest challenges - obviously the image-specific suggestions can be ignored if you aren't analyzing image data)

If you cannot increase the amount of memory (RAM) in your computer then the following suggestions might help you free up enough memory so that your analysis can complete:

Use care when using graphical user interfaces (GUIs). Whether using Solo (where you can only use GUIs) or using Matlab and PLS_Toolbox and choosing to use the GUIs, you need to be aware that the GUI gives you less control over memory requirements. Here's some general hints for GUI use:

  • Import your data DIRECTLY into the Analysis interface OR make sure you delete it from the base-workspace (or Image Manager window, if you used that) after moving it into Anaylsis. This will assure you do not have multiple copies of the raw data in memory.
  • Preprocess your data in advance and use the "Preprocess > Save Preprocessed data" menu option to save the data to a disk file. Then, load that preprocessed data from the disk file as X and set the preprocessing to "none".
  • Use Hard-Delete Excluded (from the DataSet Editor's "edit" menu) after excluding variables you do NOT want to include in your model.
  • For image data, use the Crop tool in the Image Manager before analyzing the data to drop spatial regions you don't want to analyze (see note above about closing the Image Manager before doing the analysis)
  • To make the above choices less cumbersome, set the Model Cache's "maxdatasize" option to inf so it will cache all your data. This will allow you to revert from hard-deleted variables or samples by simply re-loading the old data from the Model Cache.
  • The Support-Vector Machine methods (SVM-C and SVM-R) require that the "Java Heap Space" be set larger to handle larger data. It is STRONGLY recommended that you use "compression" on these model types (see the SVM settings panel) to reduce the variable space. Likewise, see the Java Heap Space FAQ
  • Cluster analysis using HCA is VERY difficult to use on large data. Use Partional K-Means (DCA) instead.

If you are using PLS_Toolbox and can use the Matlab command line, you will have more control over memory usage. The rules are very similar to the above except that the management of memory will be up to you. You will probably need to assure that you save each modification to your data to a disk file and erase all other items from the Matlab workspace before analyzing the modified image. The "keep" function may be of use in this (it is a quick way to delete everything EXCEPT for specifically-listed items)


Related MathWorks information:

Performance and Memory


Still having problems? Please contact our helpdesk at helpdesk@eigenvector.com