(when are floating point operations faster than integer ones?)
I was listening to the interview with Andrei Alexandrescu  of C++ and Dlang fame (cheers homie! :)) on CppCast when I heard something interesting: float divisions are supposed to be faster than integer ones on modern CPUs! So I had to dig into that.
The setup
I whipped up some JMH benchmarks (the sources are on GitHub) and ran them on three machines:
 An older Xeon E52665
 An i74600U
 An i74750HQ
(Other factors  like memory, CPU count, exact JVM version, etc  are not that relevant to this discussion since this is strictly a singlethreaded ALU benchmark)
The answers
You can find the exact numbers in the GitHub repository and I also created a Google Docs spreadsheet to better visualize the results.
 What is the order (from a speed point of view) of integer operations?
 They seem to fall into two categories div/mod and the rest, with the former being twice as slow as the later.

This seems to hold regardless of data type (byte / short / long / etc) or CPU (though the later CPUs are ~30% faster)

Is computation on smaller integers (ie. byte / short / char) faster / slower / the same than on larger (int / long)?
 In general in is the fastest, followed by long and the other types (byte / char / short)

For modulo (
%
) long is particularly slow 
The same question as above for floating point numbers: are calculations on float faster / slower / the same as on double?
 For addition / subtraction they are the same speed (note that I only tested numbers which are of similar magnitude  so perhaps the results are different if rescaling is involved)
 For division float was 30% to 50% faster (depending on the CPU model) than double (!)

The order of operations (in decreasing order of ops/s) seems to be:
 addition / subtraction
 multiplication (about the same speed)
 division (about 30% to 50% slower)
 modulo (30% to 50% slower than division)
 multiplication / division on denormal values  this is at least 50x (yes, times not percent!) slower

Back to the original question: can floating point division be faster than the integer one?
 Amazingly yes! The benchmark shows that float division is ~30% faster than integer division (which is approximately the same speed as double division and both are 2x faster than long division)
 This does not hold for related operations like multiplication (where they are approximately the same speed) and modulo (where int more than twice as fast as the others)
The answers  part II
So this is it, right? Use float division? Lets apply this optimization to some realworld problems:
And the results are:
 For vector multiplication we indeed see a 2x to 5x performance gain when using float instead of int
 However for interpolation search we see the opposite: ints are ~30% faster than floats
Why might that be?
My working theory is that individual floating point divisions are (slightly) slower than integer divisions. However: there are probably more ALUs in the CPU which can do floating point operations and if the operations can be pipelined (ie. they can be executed in parallel / there is no data dependency between them) the overall throughput is better.
In other cases  like with interpolation search  where we need the result of the individual division before we can progress, pipelineing doesn’t help.
Some conclusions
Do your own benchmarks! Don’t trust a page on the internet! Things can change considerably with new CPU architectures / JVMs!
 Consider investing in hardware. There can be considerable performance gain in running on the latest vs. 34 year old hardware and it’s a simple upgrade
 Store your data in int / float if it fits. Anything smaller hurts your arithmetic performance (but can help considerably if you’re bottlenecked on memory bandwidth)
 Use the correct kind of data (floating point for simulations and integers for everything else  including monetary values which should be stored as multiples of 1/1000th of a cent)
 Don’t use denormalized floating point numbers for multiplication / division if you can help it

Of course there is a lot more to floating point algorithms like numerical stability so thread lightly when mucking around with such algorithms

Using shifts / masking where appropriate (for example multiplying / dividing by powers of two) is still a very worthwhile optimization (the JVM/JIT does this automatically when it can “see” the constant part  ie. there is no need to write
v >> 2
instead ofv / 4
)