# Chatterjee Correlation

This is my first attempt on implementing a statistical method. This problem was given to me by @lejmer (who also helped me later on building more efficient code to achieve this) when we were debating on the need for higher resource allocation to run scripts so it can run longer and faster. The major problem faced by those who want to implement statistics based methods is that they run out of processing time or need to limit the data samples. My point was that such things need be implemented with an algorithm which suits pine instead of trying to port a python code directly. And yes, I am able to demonstrate that by using this implementation of Chatterjee Correlation.

🎲 What is Chatterjee Correlation?
The Chatterjee rank Correlation Coefficient (CCC) is a method developed by Sourav Chatterjee which can be used to study non linear correlation between two series.
Full documentation on the method can be found here:
arxiv.org/pdf/1909.10140.pdf

In short, the formula which we are implementing here is:

Algorithm can be simplified as follows:
1. Get the ranks of X
2. Get the ranks of Y
3. Sort ranks of Y in the order of X (Lets call this SortedYIndices)
4. Calculate the sum of adjacent Y ranks in SortedYIndices (Lets call it as SumOfAdjacentSortedIndices)
5. And finally the correlation coefficient can be calculated by using simple formula
CCC = 1 - (3*SumOfAdjacentSortedIndices)/(n^2 - 1)

🎲 Looks simple? What is the catch?
Mistake many people do here is that they think in Python/Java/C etc while coding in Pine. This makes code less efficient if it involves arrays and loops. And the simple code may look something like this.

```var xArray = array.new<float>()
var yArray = array.new<float>()
array.push(xArray, x)
array.push(yArray, y)
sortX = array.sort_indices(xArray)
sortY = array.sort_indices(yArray)

index = array.get(xSortIndices, 0)
for i=1 to n > 1? n -1 : na
indexNext = array.get(sortX, i)
index := indexNext

But, problem here is the number of loops run. Remember pine executes the code on every bar. There are loops run in array.sort_indices and another loop we are running to calculate SumOfAdjacentSortedIndices. Due to this, chances of program throwing runtime errors due to script running for too long are pretty high. This limits greatly the number of samples against which we can run the study. The options to overcome are

• Limit the sample size and calculate only between certain bars - this is not ideal as smaller sets are more likely to yield false or inconsistent results.
• Start thinking in pine instead of python and code in such a way that it is optimised for pine. - This is exactly what we have done in the published code.

🎲 How to think in Pine?
In order to think in pine, you should try to eliminate the loops as much as possible. Specially on the data which is continuously growing.

My first thought was that sorting takes lots of time and need to find a better way to sort series - specially when it is a growing data set. Hence, I came up with this library which implements Binary Insertion Sort.

Replacing array.sort_indices with binary insertion sort will greatly reduce the number of loops run on each bar. In binary insertion sort, the array will remain sorted and any item we add, it will keep adding it in the existing sort order so that there is no need to run separate sort. This allows us to work with bigger data sets and can utilise full 20,000 bars for calculation instead of few 100s.

However, last loop where we calculate SumOfAdjacentSortedIndices is not replaceable easily. Hence, we only limit these iterations to certain bars (Even though we use complete sample size). Plots are made for only those bars where the results need to be printed.

🎲 Implementation
Current implementation is limited to few combinations of x and fixed y. But, will be converting this into library soon - which means, programmers can plug any x and y and get the correlation.

Our X here can be
• Average volume
• ATR

And our Y is distance of price from moving average - which identifies trend.

Thus, the indicator here helps to understand the correlation coefficient between volume and trend OR volatility and trend for given ticker and timeframe. Value closer to 1 means highly correlated and value closer to 0 means least correlated. Please note that this method will not tell how these values are correlated. That is, we will not be able to know if higher volume leads to higher trend or lower trend. But, we can say whether volume impacts trend or not.

Please note that values can differ by great extent for different timeframes. For example, if you look at 1D timeframe, you may get higher value of correlation coefficient whereas lower value for 1m timeframe. This means, volume to trend correlation is higher in 1D timeframe and lower in lower timeframes.

Skrip open-source

Dalam semangat TradingView, penulis dari skrip ini telah mempublikasikannya ke sumber-terbuka, maka trader dapat mengerti dan memverifikasinya. Semangat untuk penulis! Anda dapat menggunakannya secara gratis, namun penggunaan kembali kode ini dalam publikasi diatur oleh Tata Tertib. Anda dapat memfavoritkannya untuk digunakan pada chart

Pernyataan Penyangkalan

Informasi dan publikasi tidak dimaksudkan untuk menjadi, dan bukan merupakan saran keuangan, investasi, perdagangan, atau rekomendasi lainnya yang diberikan atau didukung oleh TradingView. Baca selengkapnya di Persyaratan Penggunaan.

Inggin menggunakan skrip ini pada chart?