Wednesday, 18 September 2013

How to pick up repeat image pairs among lots of images ? How to std::hash OpenCV Mat in memory?

How to pick up repeat image pairs among lots of images ? How to std::hash
OpenCV Mat in memory?

My application problem is that, I can get around 500 images, but there
might be 1 or 2 of a pair of 2 images are completely the same, this means
the file's checksum are the same.
However now I have to apply a compression algorithm on these 500 images,
because the uncompressed images occupy too much disk space. Well, the
compression breaks the checksum, so that I cannot use the checksum of the
compressed images file to find out which are the repeated image pairs.
Fortunately, my compression algorithm is lossless, this means the restored
uncompressed images can still be hashed somehow. But I just want to do
this in memory without much disk write access. So my problem is how to
efficiently pick up repeated image among large number of images files in
memory?
I use opencv often, but the answer will be good as long as it is efficient
without saving any file on disk. Python/Bash code will be also acceptable,
C/C++ and OpenCV is preferred.
I can think of use OpenCV 's Mat, with std::hash, but std::hash won't work
directly, I have to code the std::hash<cv::Mat> specifically, and I don't
know how to do it properly yet.

No comments:

Post a Comment