SSE stands for Streaming SIMD Extensions, and is an SIMD instruction set designed by Intel Corporation, and introduced in their Pentium III as a response to AMD's 3DNow! instruction set, which debuted a year earlier. SSE was originally codenamed KNI, or Katmai New Instructions, Katmai being the name of the first Pentium III core manufactured for public consumption. During the development of the Pentium III, Intel was looking for a way to distinguish it from their earlier product line, particularly their then-flagship CPU, the Pentium II.
Intel was generally disappointed with their first IA-32 SIMD instruction set, MMX, because it has two serious drawbacks: it re-used existing floating point registers, making the CPU unable to work on both floating point and SIMD data at the same time, and it worked on with integers; SSE added eight new 128-bit registers known as XMM0 through XMM7. Each register packed together four 32-bit single-precision floating point numbers.
Because these 128-bit registers are additional program states that the operating system must preserve across task switches, they are disabled by default until the operating system explicitly enables them. This means that the operating system must know how to use the FXSAVE and FXRSTR instructions, which is the extended pair of instructions which can save all x86, MMX, 3DNow!, and SSE register states at once. This support was quickly added to all major IA-32 operating systems.
Because SSE adds floating point support, it saw much more use than MMX did now that the graphics cards of the day could all handle integer calculations internally. Integer SIMD operations may still be performed with the eight 64-bit MMX registers. The MMX registers are "aliased" on top of the eight FPU registers.
On the Pentium III, SSE is implemented using the same circuitry as the FPU, meaning that, once again, the CPU cannot issue both FPU and SSE instructions at the same time for pipelining. The separate registers do allow SIMD and scalar floating point operations to be mixed without the performance hit from explicit MMX/floating point mode switching.
AMD did not originally implement SSE, instead implementing the instructions Intel had supplied with SSE to extend the functionality of MMX, under the name Extended MMX, which debuted in the Athlon XP line of processors.