VTV Lab

VTV Lab

Quick sort experiments

Today I spared sometime running some experiments around quick sort just for fun.

I implemented one by myself recursively, found another in-place one here which I name it InternetQuickSort as one Google Search returns almost this same implementation (I am not certain about the intellectual property ownership here though) and the third one is using our DualPivot Quick Sort under Arrays java utility class. About my implementation, it is a dump one as I have no intention to compete with other which got spent a huge amount of time of many smart people in order to optimize it, my implementation is only served as an example of how bad it is and in order to see how good the other two are.

In psuedo code, my implementation is like

MySort

# Step 1: Pick a pivot value by getting the mean of three elements at startingPos, endingPos and mid point

# Step 2: Loop from startingPos to endingPos and form two arrays, one with elements greater than pivotVal and one with element smaller than pivotVal

# Copy back the two arrays into the original arrays

# Recursively call MySort on two sub arrays by changing startingPos and endingPos

Below are the experiment setup

System: runs on a Intel Core i5 3Ghz x 4, 64 bits, Java 7

Input: My experiments are conducted on arrays size starting of 250,000 element and incremental step of also 250,000 element, 25 steps

Outcome:

Array length
DualPivotQS MyQS InternetQS
250000 56.286573 369.059243 30.143644
500000 147.527238 417.64615 237.833958
750000 213.578431 463.814259 108.854696
1000000 293.005232 551.036705 214.40796
1250000 341.151265 1342.6494 260.471331
1500000 424.597431 755.479901 357.659643
1750000 506.3088 849.871125 399.342371
2000000 627.375886 2165.276304 419.702276
2250000 783.32427 1318.824876 542.290206
2500000 838.320689 1355.4506 627.090102
2750000 903.471969 3194.246002 758.326834
3000000 1098.818221 1533.688473 734.657916
3250000 1033.541113 1962.902445 931.064769
3500000 1178.640748 1948.658469 1192.178009
3750000 4717.532571 2049.95972 1107.368695
4000000 1340.011815 5101.839197 1174.539527
4250000 1612.278331 2317.015331 1211.459723
4500000 1361.536276 2795.669291 1289.295309
4750000 1499.520603 2789.735822 1578.865415
5000000 1560.867094 3030.472058 1729.399364
5250000 1621.528772 3191.477809 1639.488514
5500000 2022.796537 3879.536725 1841.024578
5750000 1907.504151 3566.779162 1939.123883
6000000 1878.408871 3691.383492 2343.128945

I have made a simple chart here QuickSortBenchmark

Interestingly, most of the case the InternetQuickSort version outperforms the DualPivotQuickSort but they are quite close to each other. My implementation doesn’t give a clear trend with quite large fluctuation range. However both mine and Dual Pivot QuickSort have a spike in performance when the number of element around 4 millions while the Internet Quick Sort version doesn’t encounter that at all.

Will have some more update about this.

Swap values experiments on using different approaches

Folks, we all know one typical interview question around swapping values for two variables. And maybe all of us remember by heart that there are multiple ways to do so.

The most straightforward way which everyone must know during school is using a temp variable.

Such as:

int a = 10, int b = 5;

int tmp = a;

a = b; b = tmp;

The second way is to use arithmetic operations such as addition/subtraction or multiplication/division

Code is like

int a = 10; int b = 5;

a = a + b; b = a – b; a = a – b;

The good thing is we don’t need to use a temp variable here and it means less memory/register used.

Same logic goes for multiplication and division. However, another interesting way which is often asked is using XOR.

a = a ^ b; b = a ^ b; a = a ^ b;

Three assignments with same expression. Nothing easier to remember.

But it seems few people have ask which one is better except the memory assigned for the extra variable. I did some simple benchmark today and would like to share with you guys. In a block of 100k “swap” operations, I increment it one block at a time and measure the time it takes using three different above approaches.

Set up:

System: runs on a Intel Core i5 3Ghz x 4, 64 bits, Java 7

Input are all primitive integers, time measure in nano second. Before running, did some “warm up” for JVM to be ready.

What I found out is there are no significant difference between using a tmp variable and using arithmetic operation. The time taken fluctuates around 300 ns regardless of the number of block of operation (be it 100k swap or 50 times more).

Using XOR, a clearly linear trend in the time taken can be observed. Approximately 230k ns is needed for one block (100,000 swaps).  This is 1000 times as slow as other traditional methods.

Conclusion: Using XOR approach is a crappy theory which is only valid in interview/text book context!

Attaching a graph of the result swap_experiment

Welcome any comments or question.

Smart Squirrel is on the way to be released

Would like to reblog here!

Tata Game Labs

Hey Folks,

We are happy to proud of ourself to make this announcement. Another game for kids named Smart Squirrel is on the way to be released shortly.

The game is our effort to diversify our character base utilising what we already have in Icy Math Bootstrap

Stay with us and we will keep you posted accordingly.

P/S: Our Smart Squirrel has been released a week ago here we are busy … playing it!

Cheers

Tata Game Labs

View original post

Connect to MySQL server from HeidiSQL with SSH tunnel

I have spent few hours to figure out how to connect to my MySQL server running at Amazon EC2 with HeidiSQL. The reason I used HeidiSQL is its great friendliness coupled with many great features: copying the whole database from remote servers and vice versa or working with multiple servers at the same time. HeidSQL usually works like a charm with plain TCP connections. However, it gave me some hard time understanding how to use it with SSH enable option.

Very simple. You need to roughly understand how the whole picture is look like. The following two questions will help you when you have to fill in the configurations for the connection.

Amazon EC2 specific questions: Do I need to open the port 3306 for MySQL server to accept outside connections if I use SSH connection?

Answer: No, you don’t need. The reason is you are making an SSH connection to the SSH server at your Amazon EC2 from your local computer (where the HeidiSQL is), not directly making a connection to MySQL server anymore. More precisely, you have to make sure your SSH is running (this is as I know the default config of Amazon), opening the default port for SSH to accept outside connections: port 22 and that’s it.

HeidiSQL specific question: How do I enter the params in the Heidi?

Answer:

In the ‘Settings’ tab

1. In the dropdown list of ‘Network type’, please select SSH tunnel

2. Hostname/IP: localhost (even you are connecting remotely)

3. Username & Password: your mysql user and password

Next, in the tab SSH Tunnel:

1. specify plink.exe or you need to download it and specify where it’s located

2. Host + port: the remote IP of your SSH server(should be MySQL server as well), port 22 if you don’t change anything

3. Username & password: SSH username (not MySQL users)

4. You might ignore the password in the above step if you have the .ppk file.

After that, you can test the connection. Hope it works!

A bit on market research: who is providing historical tick-level market data?

Alright, what we are having on the table now?

After a little bit of researching, I found some quite interesting things.

1. For those who are looking at the daily data, I think Yahoo Finance would be a good place to start. You can download such historical data as much as you want. Of course, as you might know, it is FREE!

2. However, for those who want to have tick level historical data. Yahoo Finance is definitely not good enough. I found the following data providers, you might consider:

TickData: providing a full range of historical financial data of all markets in the world, from stock, equity to options and etc. However, the price is not as that nice. E.g: for a complete database of one year of Japan equity, you have to pay 7000 USD, or full database of Japan equity, it would be 61300 USD. They provide a desktop software to manage and process their historical data and store in your favorite format without any hassle.

CQG Data Factory: seems like a better option for those with lighter pocket as they offer the option for buyer to choose individual instrument rather than the whole market. Also, I couldn’t find they are offering options data (?). About their price, this is an example for your reference. Mini NKY 225, OSE exchange, for 5 months from Jan 2011 to June 2011, the charge is about 90 USD!

XigNite: ?

What else? Maybe I need to dig some more then.

Post Navigation

Follow

Get every new post delivered to your Inbox.