In software, find first set (ffs) or find first one is a bit operation that, given an unsigned machine word, identifies the least significant index or position of the bit set to one in the word. A nearly equivalent operation is count trailing zeros (ctz) or number of trailing zeros (ntz), which counts the number of zero bits following the least significant one bit. The complementary operation that finds the index or position of the most significant set bit is log base 2, so called because it computes the binary logarithm .^{[1]} This is closely related to count leading zeros (clz) or number of leading zeros (nlz), which counts the number of zero bits preceding the most significant one bit. These four operations also have negated versions:
There are two common variants of find first set, the POSIX definition which starts indexing of bits at 1,^{[2]} herein labelled ffs, and the variant which starts indexing of bits at zero, which is equivalent to ctz and so will be called by that name.
Given the following 32bit word:
00000000000000001000000000001000
The count trailing zeros operation would return 3, while the count leading zeros operation returns 16. The count leading zeros operation depends on the word size: if this 32bit word were truncated to a 16bit word, count leading zeros would return zero. The find first set operation would return 4, indicating the 4th position from the right. The log base 2 is 15.
Similarly, given the following 32bit word, the bitwise negation of the above word:
11111111111111110111111111110111
The count trailing ones operation would return 3, the count leading ones operation would return 16, and the find first zero operation ffz would return 4.
If the word is zero (no bits set), count leading zeros and count trailing zeros both return the number of bits in the word, while ffs returns zero. Both log base 2 and zerobased implementations of find first set generally return an undefined result for the zero word.
Many architectures include instructions to rapidly perform find first set and/or related operations, listed below. The most common operation is count leading zeros (clz), likely because all other operations can be implemented efficiently in terms of it (see Properties and relations).
Platform  Mnemonic  Name  Operand widths  Description  Result of operating on 0 

ARM (ARMv5T architecture and later) except CortexM0/M0+/M1/M23 
clz^{[3]}  Count Leading Zeros  32  clz  32 
ARM (ARMv8A architecture)  clz  Count Leading Zeros  32, 64  clz  Operand width 
AVR32  clz^{[4]}  Count Leading Zeros  32  clz  32 
DEC Alpha  ctlz^{[5]}  Count Leading Zeros  64  clz  64 
cttz^{[5]}  Count Trailing Zeros  64  ctz  64  
Intel 80386 and later  bsf^{[6]}  Bit Scan Forward  16, 32, 64  ctz  Undefined; sets zero flag 
bsr^{[6]}  Bit Scan Reverse  16, 32, 64  Log base 2  Undefined; sets zero flag  
x86 supporting ABM  lzcnt^{[7]}  Count Leading Zeros  16, 32, 64  clz  Operand width; sets carry flag 
x86 supporting BMI1  tzcnt^{[8]}  Count Trailing Zeros  16, 32, 64  ctz  Operand width; sets carry flag 
Itanium  clz^{[9]}  Count Leading Zeros  64  clz  64 
MIPS  clz^{[10]}^{[11]}  Count Leading Zeros in Word  32, 64  clz  Operand width 
clo^{[10]}^{[11]}  Count Leading Ones in Word  32, 64  clo  Operand width  
Motorola 68020 and later  bfffo^{[12]}  Find First One in Bit Field  Arbitrary  Log base 2  Field offset + field width 
PDP10  jffo  Jump if Find First One  36  ctz  0; no operation 
POWER/PowerPC/Power ISA  cntlz/cntlzw/cntlzd^{[13]}  Count Leading Zeros  32, 64  clz  Operand width 
Power ISA 3.0 and later  cnttzw/cnttzd^{[14]}  Count Trailing Zeros  32, 64  ctz  Operand width 
RISCV ("B" Extension) (draft)  clz^{[15]}  Count Leading Zeros  TBD  clz  Operand width 
ctz^{[15]}  Count Trailing Zeros  TBD  ctz  Operand width  
SPARC Oracle Architecture 2011 and later  lzcnt (synonym: lzd) ^{[16]}  Leading Zero Count  64  clz  64 
VAX  ffs^{[17]}  Find First Set  032  ctz  Operand width; sets zero flag 
z/Architecture  vclz^{[18]}  Vector Count Leading Zeroes  8, 16, 32, 64  clz  Operand width 
vctz^{[19]}  Vector Count Trailing Zeroes  8, 16, 32, 64  ctz  Operand width 
On some Alpha platforms CTLZ and CTTZ are emulated in software.
A number of compiler and library vendors supply compiler intrinsics or library functions to perform find first set and/or related operations, which are frequently implemented in terms of the hardware instructions above:
Tool/library  Name  Type  Input type(s)  Notes  Result for zero input 

POSIX.1 compliant libc 4.3BSD libc OS X 10.3 libc^{[2]}^{[20]} 
ffs 
Library function  int  Includes glibc. POSIX does not supply the complementary log base 2 / clz. 
0 
FreeBSD 5.3 libc OS X 10.4 libc^{[21]} 
ffsl fls flsl 
Library function  int, long 
fls ("find last set") computes (log base 2) + 1.  0 
FreeBSD 7.1 libc^{[22]}  ffsll flsll 
Library function  long long  0  
GCC 3.4.0^{[23]}^{[24]} Clang 5.x ^{[25]}^{[26]} 
__builtin_ffs[l,ll,imax] __builtin_clz[l,ll,imax] __builtin_ctz[l,ll,imax] 
Builtin functions  unsigned int, unsigned long, unsigned long long, uintmax_t 
GCC documentation considers result undefined clz and ctz on 0.  0 (ffs) 
Visual Studio 2005  _BitScanForward ^{[27]}_BitScanReverse ^{[28]} 
Compiler intrinsics  unsigned long, unsigned __int64 
Separate return value to indicate zero input  0 
Visual Studio 2008  __lzcnt ^{[29]} 
Compiler intrinsic  unsigned short, unsigned int, unsigned __int64 
Relies on x64only lzcnt instruction  Input size in bits 
Intel C++ Compiler  _bit_scan_forward _bit_scan_reverse ^{[30]} 
Compiler intrinsics  int  Undefined  
NVIDIA CUDA^{[31]}  __clz

Functions  32bit, 64bit  Compiles to fewer instructions on the GeForce 400 Series  32 
__ffs 
0  
LLVM  llvm.ctlz.* llvm.cttz.* ^{[32]} 
Intrinsic  8, 16, 32, 64, 256  LLVM assembly language  Input size if arg 2 is 0, else undefined 
GHC 7.10 (base 4.8), in Data.Bits 
countLeadingZeros countTrailingZeros 
Library function  FiniteBits b => b 
Haskell programming language  Input size in bits 
If bits are labeled starting at 1 (which is the convention used in this article), then count trailing zeros and find first set operations are related by ctz(x) = ffs(x)  1 (except when the input is zero). If bits are labeled starting at 0, then count trailing zeros and find first set are exactly equivalent operations.
Given w bits per word, the log base 2 is easily computed from the clz and vice versa by lg(x) = w  1  clz(x).
As demonstrated in the example above, the find first zero, count leading ones, and count trailing ones operations can be implemented by negating the input and using find first set, count leading zeros, and count trailing zeros. The reverse is also true.
On platforms with an efficient log base 2 operation such as M68000, ctz can be computed by:
where "&" denotes bitwise AND and "x" denotes the two's complement of x. The expression x & (x) clears all but the leastsignificant 1 bit, so that the most and leastsignificant 1 bit are the same.
On platforms with an efficient count leading zeros operation such as ARM and PowerPC, ffs can be computed by:
Conversely, clz can be computed using ctz by first rounding up to the nearest power of two using shifts and bitwise ORs,^{[33]} as in this 32bit example (which depends on ctz returning 32 for the zero input):
function clz(x): for each y in {1, 2, 4, 8, 16}: x return 32  ctz(x + 1)
On platforms with an efficient Hamming weight (population count) operation such as SPARC's POPC
^{[34]}^{[35]} or Blackfin's ONES
,^{[36]}, there is:
where "^" denotes bitwise exclusiveor and "~" denotes bitwise negation.
Applying the same roundingup technique, clz can be computed by:
function clz(x): for each y in {1, 2, 4, 8, 16}: x return 32  pop(x)
The inverse problem (given i, produce an x such that ctz(x) = i) can be computed with a leftshift (1 << i).
Find first set and related operations can be extended to arbitrarily large bit arrays in a straightforward manner by starting at one end and proceeding until a word that is not allzero (for ffs/ctz/clz) or not allone (for ffz/clo/cto) is encountered. A tree data structure that recursively uses bitmaps to track which words are nonzero can accelerate this.
Find first set can also be implemented in software. A simple loop implementation:
function ffs (x) if x = 0 return 0 t while (x & t) = 0 t return r
where "<<" denotes leftshift. Similar loops can be used to implement all the related operations. On modern architectures this loop is inefficient due to a large number of conditional branches. A lookup table can eliminate most of these:
table[0..2^{n}1] = ffs(i) for i in 0..2^{n}1 function ffs_table (x) if x = 0 return 0 r loop if (x & (2^{n}1)) ? 0 return r + table[x & (2^{n}1)] xThe parameter n is fixed (typically 8) and represents a timespace tradeoff. The loop may also be fully unrolled. Alternatively, one can use the de Bruijn hash function for CTZ and add one to the result.
CTZ
Count Trailing Zeros (ctz) counts the number of zero bits succeeding the least significant one bit. For example, the ctz of 0x00000F00 is 8, and the ctz of 0x80000000 is 31.
An algorithm for 32bit ctz by Leiserson, Prokop, and Randall uses de Bruijn sequences to construct a minimal perfect hash function that eliminates all branches.^{[40]}^{[41]} This algorithm assumes that the result of the multiplication is truncated to 32 bit.
table[0..31] initialized by: for i from 0 to 31: table[ ( 0x077CB531 * ( 1 << i ) ) >> 27 ] function ctz_debruijn (x) return table[((x & (x)) * 0x077CB531) >> 27]The expression (x & (x)) again isolates the leastsignificant 1 bit. There are then only 32 possible words, which the unsigned multiplication and shift hash to the correct position in the table. (This algorithm does not handle the zero input.) A similar algorithm works for log base 2, but rather than isolate the mostsignificant bit, it rounds up to the nearest integer of the form 2^{n}−1 using shifts and bitwise ORs:^{[42]}
table[0..31] = {0, 9, 1, 10, 13, 21, 2, 29, 11, 14, 16, 18, 22, 25, 3, 30, 8, 12, 20, 28, 15, 17, 24, 7, 19, 27, 23, 6, 26, 5, 4, 31} function lg_debruijn (x) for each y in {1, 2, 4, 8, 16}: x return table[(x * 0x07C4ACDD) >> 27]A binary search implementation which takes a logarithmic number of operations and branches, as in these 32bit versions:^{[43]}^{[44]} This algorithm can be assisted by a table as well, replacing the bottom three "if" statements with a 256 entry lookup table using the final byte as an index.
function ctz (x) if x = 0 return 32 n if (x & 0x0000FFFF) = 0: n if (x & 0x000000FF) = 0: n if (x & 0x0000000F) = 0: n if (x & 0x00000003) = 0: n if (x & 0x00000001) = 0: n return nCLZ
Count Leading Zeros (clz) counts the number of zero bits preceding the most significant one bit. For example, a 32bit clz of 0x00000F00 is 20, and the clz of 0x00000001 is 31.
Just as count leading zeros is useful for software floating point implementations, conversely, on platforms that provide hardware conversion of integers to floating point, the exponent field can be extracted and subtracted from a constant to compute the count of leading zeros. Corrections are needed to account for rounding errors.^{[43]}^{[45]}
The nonoptimized approach examines one bit at a time until a nonzero bit is found, as shown in this C language example, and slowest with an input value of 1 because of the many loops it has to perform to find it.
#include <stdint.h> /* This header defines uint32_t, uint16_t, uint8_t for the following examples */ unsigned int clz32a( uint32_t x ) /* 32bit clz */ { unsigned int n; if (x == 0) n = 32; else for (n = 0; ((x & 0x80000000) == 0); n++, x <<= 1) ; return n; } unsigned int clz16a( uint16_t x ) /* 16bit clz */ { unsigned int n; if (x == 0) n = 16; else for (n = 0; ((x & 0x8000) == 0); n++, x <<= 1) ; return n; } unsigned int clz8a( uint8_t x ) /* 8bit clz */ { unsigned int n; if (x == 0) n = 8; else for (n = 0; ((x & 0x80) == 0); n++, x <<= 1) ; return n; }An evolution of the previous looping approach examines four bits at a time then using a lookup table for the final four bits, which is shown here with a 16 (2^{4}) entry lookup table. A faster looping approach would examine eight bits at a time with a 256 (2^{8}) entry table.
#include <stdint.h> static const uint8_t clz_table_4bit[16] = { 4, 3, 2, 2, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0 }; unsigned int clz32b( uint32_t x ) /* 32bit clz */ { unsigned int n; if (x == 0) { n = 32; } else { for (n = 0; ((x & 0xF0000000) == 0); n += 4, x <<= 4) ; n += (unsigned int)clz_table_4bit[x >> (324)]; } return n; }Faster than the looping method is a binary search implementation which takes a logarithmic number of operations and branches, as in these 32bit versions:^{[43]}^{[44]} The initial check for zero was removed because this method doesn't have a loop.
#include <stdint.h> unsigned int clz32c( uint32_t x ) /* 32bit clz */ { unsigned int n = 0; if ((x & 0xFFFF0000) == 0) {n = 16; x <<= 16;} if ((x & 0xFF000000) == 0) {n += 8; x <<= 8;} if ((x & 0xF0000000) == 0) {n += 4; x <<= 4;} if ((x & 0xC0000000) == 0) {n += 2; x <<= 2;} if ((x & 0x80000000) == 0) {n += 1;} return n; }The binary search algorithm can be assisted by a table as well, replacing the bottom two "if" statements with a 16 (2^{4}) entry lookup table using the final 4bits as an index, which is shown here. An alternate approach replaces the bottom three "if" statements with a 256 (2^{8}) entry table using the final 8 bits as an index.
#include <stdint.h> static const uint8_t clz_table_4bit[16] = { 4, 3, 2, 2, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0 }; unsigned int clz32d( uint32_t x ) /* 32bit clz */ { unsigned int n = 0; if ((x & 0xFFFF0000) == 0) {n = 16; x <<= 16;} if ((x & 0xFF000000) == 0) {n += 8; x <<= 8;} if ((x & 0xF0000000) == 0) {n += 4; x <<= 4;} n += (unsigned int)clz_table_4bit[x >> (324)]; return n; }As with many algorithms, the fastest approach to simulate clz would be a precomputed lookup table. In two previous examples,
clz_table_4bit[16]
was a lookup table for all 16 (2^{4}) values of a 4bit clz operation. For common bytewidth increments, an 8bit clz requires 256 (2^{8}) values, a 16bit clz requires 64K (2^{16}) values, a 32bit clz requires 4G (2^{32}) values. Though a 32bit clz table is possible on some computers, wasting 4GB of memory for a table is not practical. The next step down requires 64KB, which may be a reasonable amount for some computers.#include <stdint.h> static uint8_t clz_table_16bit[65536]; /* This table MUST be computed, using clz16a above, before calling the following functions */ unsigned int clz32e( uint32_t x ) /* 32bit clz */ { if ((x & 0xFFFF0000) == 0) return (unsigned int)clz_table_16bit[x] + 16; else return (unsigned int)clz_table_16bit[x >> 16]; } unsigned int clz16e( uint16_t x ) /* 16bit clz */ { return (unsigned int)clz_table_16bit[x]; } unsigned int clz8e( uint8_t x ) /* 8bit clz */ { return (unsigned int)clz_table_16bit[x]  8; }Applications
The count leading zeros (clz) operation can be used to efficiently implement normalization, which encodes an integer as m × 2^{e}, where m has its most significant bit in a known position (such as the highest position). This can in turn be used to implement NewtonRaphson division, perform integer to floating point conversion in software, and other applications.^{[43]}^{[46]}
Count leading zeros (clz) can be used to compute the 32bit predicate "x = y" (zero if true, one if false) via the identity clz(x  y) >> 5, where ">>" is unsigned right shift.^{[47]} It can be used to perform more sophisticated bit operations like finding the first string of n 1 bits.^{[48]} The expression 1 << (16  clz(x  1)/2) is an effective initial guess for computing the square root of a 32bit integer using Newton's method.^{[49]} CLZ can efficiently implement null suppression, a fast data compression technique that encodes an integer as the number of leading zero bytes together with the nonzero bytes.^{[50]} It can also efficiently generate exponentially distributed integers by taking the clz of uniformly random integers.^{[43]}
The log base 2 can be used to anticipate whether a multiplication will overflow, since .^{[51]}
Count leading zeros and count trailing zeros can be used together to implement Gosper's loopdetection algorithm,^{[52]} which can find the period of a function of finite range using limited resources.^{[44]}
The binary GCD algorithm spends many cycles removing trailing zeros; this can be replaced by a count trailing zeros (ctz) followed by a shift. A similar loop appears in computations of the hailstone sequence.
A bit array can be used to implement a priority queue. In this context, find first set (ffs) is useful in implementing the "pop" or "pull highest priority element" operation efficiently. The Linux kernel realtime scheduler internally uses
sched_find_first_bit
for this purpose.^{[53]}The count trailing zeros operation gives a simple optimal solution to the Tower of Hanoi problem: the disks are numbered from zero, and at move k, disk number ctz(k) is moved the minimum possible distance to the right (circling back around to the left as needed). It can also generate a Gray code by taking an arbitrary word and flipping bit ctz(k) at step k.^{[44]}
See also
url=
value (help). Chess Programming Wiki. Retrieved 2012.