NCA - performance issues

The current NCA implementation creates a numpy matrix (tmp) of shape (n, n, d, d). It is problem even with medium size datasets(memory requirement n².d²). On my computer I have 64Go of memory, and I cannot run NCA on a dataset of 1021 instances in dimension 122.

A simple solution to deal with this memory problem is to compute tmp_i at each iteration:
`tmp_i = np.einsum('...i,...j->...ij', dX[i], dX[i])  # shape (n, d, d)`
This way, the memory requirement in only (n . d²), and it works on the dataset of dimension (1021, 122). However, it requires more than 12 hours to compute !!

This solution increases the execution time as tmp_i is recomputed at each iteration :

Shape of the input data     |    Previous execution time     |     New execution time
(44, 122)                            |    45s                                     |     52s
(198, 122)                          |    20min                                 |     25min
(1021, 122)                        |    Memory Error                     |     >12h

Even for small datasets the execution time is very long. Do you have any idea on how improve the performance of NCA ?
 - Can we use an optimization library such as scipy.optimize (used for the implementation of MLKR for instance) ? Do you think it would significantly reduce the execution time ?
 - Another solution which seems more complex is to re-implementing the optimization in cython. 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

NCA - performance issues #45

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

NCA - performance issues #45

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions