My Stuff

Monday, October 13, 2014

Left shift by negative number used to test unique characters in a string

What is the outcome of this?

val=-32;
mask = 1<<val;

I tried to execute it on Python, it failed with error message.

But if I do it in C or C++, I get something else. Turn out, this depends o the compiler and CPU we use. As I use Pentium4 and GCC to compile this, the following snippet of the generated assembly language is interesting:

.L4:
movsbl (%edx), %ecx
movl %esi, %edi
addl $1, %edx
subl $97, %ecx
sall %cl, %edi
testl %eax, %edi
jg .L5
orl %edi, %eax

.L2:

What is "sall" above? It is Intel CPU's mnemonic for "Shift Arithmetic Left for Long Integer".

In the machine code, -32 is transformed to Second-complement of 32:

32 = 0b00100000

~32 = 0b11011111

~32+1 = 0b11100000 = -32

According Wikipedia, arithmetic shift left maintains the MSBit, so in this case MSB is always 1. Also, Intel CPUs only shifts the first 5-bits of the shifting bits (in this case, 5-bit LSB of 0b11100000 which is 0), so 1<<-32 is actuall 1<<0 = 1.

What about 1<<-33?

33 = 0b00100001

~33 = 0b11011110

~32+1 = 0b11100001 = -33

The first five bits is 1, so 1<<-33 = 1<<1 = 2

What about 1<<-31?

31 = 0b00011111

~31 = 0b11100000

~31+1 = 0b11100001 = -31

The first five bits is 1, so 1<<-31 = 1<<1 = 0x00000010

What about 1<<-7?

7 = 0b00000111

~7 = 0b11111000

~7+1 = 0b11111001 = -7

The first five bits is 1, so 1<<-7 = 1<<25 = 0x02000000

Now, how do we use this to check a uniqueness of characters in a string?

Alphabet 'A' to 'Z' has 26 characters. Alphabet 'a' to 'z' has also 26 characters.

The difference between the character i in a string is: str[i] - 'a'. Now this is the interesting part: if we declare a 32-bit variable (which is enough to cover 'A' to 'Z') where each bit corresponds to the occurence of each character in alphabet, we can determine if the character has been used before and tell the string has unique characters or not (e.g, bit 1 to tell if character 'A' has been used or not, etc.)

'A' - 'a' = -32

...

'Z' - 'a' = -7

...

'a' - 'a' = 0

...

'Z' - 'a' = 25

if we shift 1 to -32, it is the same as 1<<0. If we shift 1 to -33 it is the same as 1<<1, so on.

1<<-32 = 0x01

1<<-31 = 0x02

...

1<<-7 = 1<<25 = 0x2000000

1<<0 = 0x01

1<<1 = 0x02

...

1<<25 = 0x2000000

so:

1<<('A'-'a') == 1<<('a'-'a') = 0x01

1<<('B'-'a') == 1<<('b'-'a') = 0x02

...

1<<('Z'-'a') == 1<<('z'-'a') = 0x2000000

So the code is:

for (i=0; i<strlen(s); i++)

{

val = s[i] - 'a';

if (flags & (1 << val))

return 0; /* the string is not unique */

flags |= 1 << val;

}
return 1;

Notice this algorithm might not work on different machine with different compiler (as shifting by negative number is undefined in standard C).

The Use of c++decl

cdecl and c++decl are two nice tools to check syntax or to generate declaration of C and C++ respectively.

for example, using c++decl (the text in bold is the one we type):

$ cdecl

Type `help' or `?' for help
c++decl>

c++decl> declare foo as pointer to member of class X function (arg1, arg2) returning reference to class Y

class Y &(X::*foo)(arg1, arg2)

c++decl>

c++decl> declare foo as reference to class X

class X &foo

c++decl>

c++decl> cast foo into reference to class X

(class X &)foo

c++decl>

We can also use them to check a declaration syntax, for example:

c++decl> explain int (*a)[5]

declare a as pointer to array 5 of int

c++decl>

c++decl> explain long **foo[10]

declare foo as array 10 of pointer to pointer to long

c++decl>

c++decl> explain int*&(foo);

declare foo as reference to pointer to int

c++decl>

The function can also be used to translate an english declaration to C (or C++) declaration. For example:

$ c++decl declare "i as array 5 of pointer to function(int,int) returning reference to class X"

class X &(*i[5])(int , int )

$

$ c++decl declare "i as array 5 of pointer to function(reference to class X,constant reference to int) returning reference to class X"

class X &(*i[5])(class X &, int & const )

$

$ c++decl declare "const_ptr as const pointer to const int"

const int * const const_ptr

$ c++decl declare "const_ptr as const pointer to const reference to class X"

class X & const * const const_ptr

c++decl> declare myfunc as const pointer to function (const pointer to int, reference to class X) returning void

void (* const myfunc)(int * const , class X &)

c++decl>

c++decl> declare myfunctpr as array of pointer to function (void) returning int

int (*myfunctpr[])(void )

c++decl>

declare myptr as pointer to array 10 of struct MyStruct

struct MyStruct (*myptr)[10]

c++decl>

Saturday, October 4, 2014

Odd Numbers with Parity

#!/usr/bin/python

table=[]
for n in range(0,256):
    p=False
    while n:
        p = not p
        n &= n-1
    table.append(p)

parity_odd = []

print "Odd numbers with Parity:"



for i in range(0,256):
    if (i%2) and (table[i]):
        parity_odd.append(i)
        print i,

Result:

Odd numbers with Parity:
1 7 11 13 19 21 25 31 35 37 41 47 49 55 59 61 67 69 73 79 81 87 91 93 97 103 107 109 115 117 121 127 131 133 137 143 145 151 155 157 161 167 171 173 179 181 185 191 193 199 203 205 211 213 217 223 227 229 233 239 241 247 251 253

Thursday, October 2, 2014

Fastest Fibonacci Sequencer

This algorithm is based on Fast-doubling as explained in Fast Fibonacci algorithms.

Given F(k) and F(k+1), we can calculate these equations:

F(2k) = F(k)[2F(k+1)−F(k)]
F(2k+1) = F(k+1)2+F(k)2.

#include <stdio.h>
#include <stdlib.h>
#include <sys/types.h>

#ifdef DEBUG
#define DBGMSG(x)       printf x 
#else
#define DBGMSG(x)
#endif

typedef unsigned long long num_type;

void _fib(int n, num_type *fn, num_type *fn_1);

num_type fibonacci(uint n, num_type *s)
{
 int i;
    num_type fn=0, fn_1=1;

 if (!s)
  return 0;

 if (n < 2)
    {
  return 1;
    }
    for(i=1; i<n; i++)
    {
        _fib(i, &fn, &fn_1);
        DBGMSG(("i=%u, n=%u, fn=%llu, fn+1=%llu\n", i, n, fn, fn_1));
        s[i] = fn;
    }
    return s[n];
}

/*
 * This is base on matrix exponentiation algorithm:
 * where:
 *  f(2k) = f(k)*{ 2*f(k+1) - f(k)}         ---> c
 *  f(2k+1) = f(k)^2 + f(k+1)^2             ---> s
 *
 *  if we plugin k = n/2, we get f(n) and f(n+1)
 */
void _fib(int n, num_type *fn, num_type *fn_1)
{
    num_type c,d;
    num_type a, b;

    DBGMSG( ("%s: Initially n=%u, fn=%llu, fn_1=%llu\n", __FUNCTION__, n, *fn, *fn_1) );
    if (n==0)
    {
        *fn = 0;        // fn
        *fn_1 = 1;      // fn+1
        return; 
    }
    else {
        a = *fn;
        b = *fn_1;
    }
    _fib(n>>1, &a, &b);
    DBGMSG (("%s: local fn=%llu, fn_1=%llu\n", __FUNCTION__, a, b));
    c = a * (2*b - a);
    d = b*b + a*a;
    DBGMSG( ("%s: c=%llu, d=%llu\n", __FUNCTION__, c, d) );
    if (n % 2 == 0)
    {
        // n is even
        *fn = c;
        *fn_1 = d;
    }
    else {
        *fn = d;
        *fn_1 = c+d;
    }
}

int main(int argc, char *argv[])
{
 uint n;
 uint f;
 int i;
 num_type *result;
    char *sbuf;
    int status = 0;

    if (argc > 1)
        n = atoi(argv[1]);
    else {
     printf("Enter a number: ");
        sbuf = (char *)malloc(50);
     status = (getline(&sbuf, &n, stdin) > 0);
        if (status == 0) {
      n = atoi(sbuf);
            free(sbuf);
        }
       else {
            printf("Error in input\n");
            return (1);
        }
    }
    if (n <= 0)
    {
        printf("What have yo done???\n");
        return(0);
    }
    if (status == 0)
 {
     result = (num_type *)malloc(n*sizeof(unsigned long long));
     f = fibonacci(n, result);
  printf("Fibonacci sequence: ");
  for(i=0; i<n; i++)
   printf("%llu ", result[i]);
  printf("\n");
        if (result)
      free(result);
    }
}

Friday, September 26, 2014

Fastest bit counting

The best bit counting algorithm as far as I know is the one invented by folks at Stanford University, which is always O(1).

int bitcount(int n)
{
int cnt = 0;

n = n - ((n >> 1) & 0x55555555);
n = (n & 0x33333333) + ((n >> 2) & 0x33333333);

return (((n + (n >> 4)) & 0x0F0F0F0F) * 0x01010101) >> 24;
}

For example, after I compile it to x86 Assembly:

.cfi_startproc

movl 4(%esp), %eax

movl %eax, %edx

sarl %edx

andl $1431655765, %edx

subl %edx, %eax

movl %eax, %edx

sarl $2, %eax

andl $858993459, %edx

andl $858993459, %eax

addl %edx, %eax

movl %eax, %edx

sarl $4, %edx

addl %edx, %eax

andl $252645135, %eax

imull $16843009, %eax, %eax

sarl $24, %eax

ret

.cfi_endproc

But, how does actually the algorithm work?

Ok, let try a simple one. Assume n is a 4-bit number, and the bits can be represented as a set such that n= {a,b,c,d}., where a,b,c.. can only have either 0 or 1. The decimal value of the n is: 8*a + 4*b + 2*c + d. Total number of bit one is: a + b + c +d.

For example, for n=0b1101, a=1, b=1, c=0, d=1 (which in decimal is 8*1 + 4*1 + 2*0 + 1 = 13), and total number of bit one is a+b+c+d = 1 + 1 + 0 + 1 = 3. So far so good?

Now, we know that to count the 1-bits is as simple as: a+b+c+d. But, wait.... n itself is not a+b+c+d, but 8*a + 4*b + 2*c + d. Ok, let's conquer this step-by-step.

If we shift n one bit to the right the LSbit is gone and other numbers just divided by two, so n becomes 4*a + 2*b + c, right? Now substract this to the original n.

n = 8*a + 4*b + 2*c + d

n>>1 = 0 + 4*a + 2*b + c

----------------------------- -

= 0 + 4*a + 2*b + c + d

That's a good direction! Now if (n>>1) is written in the 4-bit nibble it is actually 0 + 4a + 2b + c. If I just want to subtract 4a and c, we have to mask out 2*b. so we mask it with binary 0101 (0x5), so we get only 4a + c:

n = 8*a + 4*b + 2*c + d

(n>>1)&0x5 = 4*a + 0 + c

---------------------------------- -

n -(n>>1)&0x05 = 4*a + 4*b + c + d

Now store this result back to n, so from now on n is 4*a + 4*b + c + d

To get c + d only, we mask n with 0b11 or n & 0x03

if we shift n above once, we get 0 + 2a + 2b + 0, but if shift it again it becomes a + b!

To make sure we only get the lowest two bits (a + b), we mask it to 0x03 or:

n & 0x03 = c + d

(n >> 2)&0x03) = a + b

------------------------ +

Nice! now we have this expression: a + b + c +d .

so, for 4-bit bit counting, we can use the expression:

n = n - (n>>1) & 0x05

bits = (n & 0x3) + (n>>2) & 0x3

Proof: as example above, n=13 (0b1101). so a=1,b=1, c=0, d=1

n = 0b1101 - (0b0110) & 0b0101 = 0b1101 - 0b100 = 13 - 4 = 9 = 0b1001

then the next step:

bits =(0b1001 & 0b0011) + (0b1001>>2) & 0b0011, or bits = 0b0001 + (0b0010) & 0b0011

or bits = 1 + 2 = 3 !!

For 32-bit, it is based on the same idea, except we have to do more.

Say n has set of coefficients {a[0], a[1], ...., a[31]} to represent the number, so n = a[0]*2^31 + a[1]*2^30 + .... + a[15]*2^0

The mask should be 32-bit, so instead of 0x5, we use 0x55555555 = 0b0101010101...0101

n = 2^31*a[0] + ... + 2^1*a[30] + 2^0*a[31]

(n>>1)&0b0101..0101 = 2^30*a[0] + ... + 2^2*a[28] + 2^0*a[30]

-------------------------------------------------------------- -

n -(n>>1)&0x055555555 = 2^31*a[0] - 2^28*a[0] + (2^30*a[1] - 2^26*a[1]) + ... + 4*a[29] - 2*a[30] + a[31]

Let's review binary arithmetics.

2³ - 2² = 8 - 4 = 4

2¹⁶ - 2¹⁵ = 65536 - 32768 = 32768
or: 2^(a+1) - 2^a = 2^a

2^(a+2) - 2^a = 2^a * 2² - 2^a = 2^a (4 - 1) = 3*2^a

So the result is:

a[0] * (2^31 - 2^28) + a[1] * (2^30 - 2^26) + ..... + a[30] (4 -2) + a[31]
= 3*2^28*a[0] + .3*2^26*a[1].. + 2*a[30] + a[31]

n - n(>>1) & 0x055555555 = 3*2^28*a[0] + .3*2^26*a[1].. + 2*a[30] + a[31]

stored this as a new n.

n>>2 = 2^24*a[0] + 2^22*a[1] + ...2*a[28] + a[29]

The rest is actually manipulation to count this a[0] + a[1] + ... + a[31]

A variant, but this one is invented by folks at MIT:

int bitcount(unsigned int n)
{
int cnt = 0;
register unsigned int tmp;

tmp = n - ((n >> 1) & 033333333333) - ((n >> 2) & 011111111111);

return ((tmp + (tmp >>3)) & 0030707070707) % 63;
}

It uses the similar method. but with different approach (notice the number 01..., 0333... and 00307... are in octal. We could use Hex version but then it is harder to remember)

Monday, June 2, 2014

Audio on Laptop HP dv7-6c80us not working

Problem:

HP-PAVILION-DV7:~$ aplay -l
aplay: device_list:268: no soundcards found...
HP-PAVILION-DV7:~$

HP-PAVILION-DV7:~$ lsmod | grep snd

snd_hda_codec_hdmi 41154 0

snd_hda_codec_idt 50341 1

snd_hda_intel 52267 0

snd_hda_codec 188738 3 snd_hda_codec_hdmi,snd_hda_codec_idt,snd_hda_intel

snd_hwdep 13602 1 snd_hda_codec

snd_pcm 102033 3 snd_hda_codec_hdmi,snd_hda_codec,snd_hda_intel

snd_page_alloc 18710 2 snd_pcm,snd_hda_intel

snd_seq_midi 13324 0

snd_seq_midi_event 14899 1 snd_seq_midi

snd_rawmidi 30095 1 snd_seq_midi

snd_seq 61560 2 snd_seq_midi_event,snd_seq_midi

snd_seq_device 14497 3 snd_seq,snd_rawmidi,snd_seq_midi

snd_timer 29433 2 snd_pcm,snd_seq

snd 69141 11 snd_hwdep,snd_timer,snd_hda_codec_hdmi,snd_hda_codec_idt,snd_pcm,snd_seq,snd_rawmidi,snd_hda_codec,snd_hda_intel,snd_seq_device,snd_seq_midi

soundcore 12680 1 snd

HP-PAVILION-DV7:~$ cat /proc/asound/pcm
00-00: 92HD81B1X5 Analog : 92HD81B1X5 Analog : playback 1 : capture 1

HP-PAVILION-DV7:~$ lspci -v
...
...
00:1b.0 Audio device: Intel Corporation 6 Series/C200 Series Chipset Family High Definition Audio Controller (rev 05)
Subsystem: Hewlett-Packard Company Device 165b
Flags: bus master, fast devsel, latency 0, IRQ 54
Memory at c6500000 (64-bit, non-prefetchable) [size=16K]
Capabilities: [50] Power Management version 2
Capabilities: [60] MSI: Enable+ Count=1/1 Maskable- 64bit+
Capabilities: [70] Express Root Complex Integrated Endpoint, MSI 00
Capabilities: [100] Virtual Channel
Capabilities: [130] Root Complex Link
Kernel driver in use: snd_hda_intel