Maybe easiest to read if you start from the bottom. Screen output in blue, program code in red. Comments can be directed to bobmoore "@" pobox.com..
11/14/2014
Progress slowed down because of difficulty porting the stats module I use in the Python 2 version to Python 3. Very frustrating. One of the measures I use to evaluate the accuracy of algorithms is the Pearson Correlation. In the 2.7 version, I used the code from the stats module, but it turned out that didn't work in the Python 3 version. Instead, I had to use a bit of code I found on the Internet for the Pearson coefficient:
It was quite a lot of work to get this integrated into the Python 3 version -- plus, my flounering about broke the link to the stats module in the Python 2 version. I have to run some tests to see whether this function works as expected. I compare it with the PEARSON function in Excel.
# import math list1=[1, 2, 3, 4, 5] list2=[5, 4, 3, 2, 1] def average(x): assert len(x) > 0 return float(sum(x)) / len(x) def pearsonr2(x, y): assert len(x) == len(y) n = len(x) assert n > 0 avg_x = average(x) avg_y = average(y) diffprod = 0 xdiff2 = 0 ydiff2 = 0 for idx in range(n): xdiff = x[idx] - avg_x ydiff = y[idx] - avg_y diffprod += xdiff * ydiff xdiff2 += xdiff * xdiff ydiff2 += ydiff * ydiff return diffprod / math.sqrt(xdiff2 * ydiff2) out=pearsonr2(list1, list2) print("Correlation of {0} with {1} is {2}".format(list1, list2, out))
Here are some tests:
Correlation of [1, 2, 3, 4, 5] with [5, 4, 3, 2, 1] is -1.0 Correlation of [1, 2, 3, 4, 5] with [1, 2, 3, 4, 5] is 1.0 Correlation of [1, 2, 3, 4, 5] with [5, 4, 5, 2, 1] is -0.8703882797784892 Correlation of [1, 2, 3, 4, 5] with [5, 4, 5, 2, 3] is -0.7276068751089989 Correlation of [1, 2, 3, 4, 5] with [5000, 4, 5, 2, 3] is -0.7073189732785681
Happily, the test results agree with the Excel PEARSON function.
Another thing I did was to make an HTML readout of Predicto forecasts on my www.philly-bob.net Free-for-All web page. What I did was insert a tiny bit of code into each IMG link: USEMAP="test">. The link to USEMAP connects to a user-defined mapping onto the imges defined here:
<MAP NAME="test"> <area shape="circle" coords="0,0,20" href="pythonfiles/RR7X.HTM"> </MAP>
This creates a 20-pixel round hotspot at the top left corner of the image, which connects to the daily forecast file RR7.HTM. Red guesses indicate downward market, green guesses indicate upward market.
Predicto Forecasts 2014-11-13
Overall Weighted Guess= 0.637
Overall Average Guess= 0.508
+hicor= 0.502 hicorn= 45 Ges= 0.478
+hiacc= 62.26 pct hiaccn= 29 hiaccn Ges= 0.521
+hiallcor= 0.346 hiallcorn= 139 Ges= 0.421
-locor= -0.491 locorn= 113 Ges= 0.460
-loacc= 28.30 pct loaccn= 79 loaccn Ges= 0.485
-loallcor= -0.370 loallcorn= 33 Ges= 0.460
Short Termers
+hishcor= 0.518 hishcorn= 92 Ges= 0.458
+hishacc= 75.00 pct hishaccn= 70 hishaccn Ges= 0.599
+hiallshcor= 0.485 hiallshcorn= 92 Ges= 0.458
-loshcor= -0.497 loshcorn= 211 Ges= 0.531
-loshacc= 25.00 pct loshaccn= 28 loshaccn Ges= 0.447
-loallshcor= -0.519 loallshcorn= 83 Ges= 0.550
This is the document I check out each day to decide whether to make minor adjustments in protecting my tiny nest egg, which is mainly invested in various stocks.
If I get curious about one particular algorithm I can use a utility program called onealgrec3 to get details. For instance, here I look up details of Algorithm #29, which just moved into first place on the "HiAcc" measure:
Which alg?29 Alg 29 ['mydiv', 'F', 'High', 'mytimes', 'AAPL', 'Open', 'IBM', 'High'] ('Next day ges=', '0.521') AvgCor= 0.19817 gesS= ['0.468', '0.513', '0.538', '0.559', '0.471', '0.456', '0.503', '0.496', '0.457', '0.498', '0.468', '0.527', '0.492', '0.478', '0.498', '0.499', '0.505', '0.484', '0.541', '0.575', '0.514', '0.382', '0.484', '0.514', '0.492', '0.490', '0.505', '0.483', '0.477', '0.494', '0.494', '0.543', '0.525', '0.568', '0.484', '0.553', '0.483', '0.512', '0.513', '0.489', '0.469', '0.510', '0.484', '0.478', '0.496', '0.509', '0.470', '0.515', '0.546', '0.496', '0.465', '0.547', '0.519'] len= 53 futs= ['-1.56', '-3.07', '10.06', '-3.07', '-6.17', '-13.10', '7.25', '1.76', '-11.91', '-1.41', '14.85', '2.59', '9.79', '-0.96', '-16.11', '-11.52', '15.53', '-32.31', '16.86', '-5.05', '-5.51', '-26.13', '0.01', '21.73', '-3.08', '-29.72', '33.79', '-40.68', '-22.08', '-31.39', '2.96', '-15.21', '0.27', '24.00', '17.25', '37.27', '-14.17', '23.71', '13.76', '-2.95', '23.42', '-2.75', '12.35', '23.40', '-0.24', '-5.71', '11.47', '7.64', '0.71', '6.34', '1.42', '-1.43', '1.08'] len= 53 AllCor= 0.30942 - - C - C C C - C - X - X - C C C C C X X C - C - C C C C C - X - C X C C C C - X - X X - X X C - X - - - ('Signifigant Accuracy=', 23, '/', 35) SigAcc= 65.71 C X C X C C C X C C X C X C C C C C C X X C X C C C C C C C X X C C X C C C C C X X X X C X X C C X X X C ('All Accuracy=', 33, '/', 53) AllAcc= 62.26 ---- ('Accuracy over last', 20, 'days') AvgCor= 0.13086 AllCor= 0.17722 C X C C C C - X - X X - X X C - X - - - ('Signifigant Accuracy=', 6, '/', 13) SigAcc= 46.15 C X C C C C C X X X X C X X C C X X X C ('All Accuracy=', 10, '/', 20) AllAcc= 50.00
This shows that Algorithm #29 has been right on 23 of the last 35 significant market days, and wrong on 12, for an accuracy of 65.7%.
11/10/2014
It's time to show the Python3 code that evaluates an algorithm. Here 'tis, a program called testeval.py:
# import random datadirloc="c:\\Python34\\predata\\" datasuf=".csv" rawvalues=['Date', 'Open', 'High', 'Low', 'Close', 'Volume', 'Adj Close'] values=rawvalues[1:5] lenvalues=len(values)-1 tickers=["SP500","AAPL","CVX","F","GE","IBM","T","WMT","XOM"] lentickers=len(tickers)-1 ops=["myplus","myminus","mydiv","mytimes"] lenops=len(ops)-1 testalg=['mytimes', 'SP500', 'Close', 'mydiv', 'GE', 'Close', 'GE', 'Close'] def stackpush(item): mystack.append(item) return def stackpop(): get=mystack.pop() return get def myplus(): a=float(stackpop()) b=float(stackpop()) return a+b def myminus(): a=float(stackpop()) b=float(stackpop()) return a-b def mydiv(): a=float(stackpop()) b=float(stackpop()) if b==0.0: return 0.0 else: return a/b def mytimes(): a=float(stackpop()) b=float(stackpop()) return a*b def valtickday(value,ticker,day): global algdate fn=datadirloc+ticker+datasuf numval=rawvalues.index(value) f=open(fn,'r') daylines=f.readlines() f.close() l=daylines[day].split(',') ans=l[numval] stackpush(ans) algdate=l[0] return ans def evalalgdays(startday, stopday): evals=[] global mystack global algdays global algdates algdays=[] algdates=[] for thisday in range(startday, stopday): algdays.append(thisday) mystack=[] valtickday(alg[-1], alg[-2], thisday) valtickday(alg[-3], alg[-4], thisday) z=eval(alg[-5]+'()') stackpush(z) valtickday(alg[-6], alg[-7], thisday) final=eval(alg[-8]+'()') evals.append(final) algdates.append(algdate) return evals def randop(): return(ops[random.randint(0,lenops)]) def randticker(): return(tickers[random.randint(0,lentickers)]) def randval(): return(values[random.randint(0,lenvalues)]) mystack=[] startday=1 stopday=15 alg=testalg evals=evalalgdays(startday, stopday) print("Alg= {0}".format(alg)) print("eval= {0}".format(evals)) print("algdays= {0}".format(algdays)) print("algdates= {0}".format(algdates)) print() alg=[randop(), randticker(), randval(), \ randop(), randticker(), randval(), \ randticker(), randval()] evals=evalalgdays(startday, stopday) print("Alg= {0}".format(alg)) print("eval= {0}".format(evals)) print("algdays= {0}".format(algdays)) print("algdates= {0}".format(algdates))
Here's the outcome of running this program, first on my standard testalg, which always evaluates to the S&P500 Close, and next on a random alg.
Alg= ['mytimes', 'SP500', 'Close', 'mydiv', 'GE', 'Close', 'GE', 'Close'] eval= [2017.81, 2018.05, 1994.65, 1982.3, 1985.05, 1961.63, 1964.58, 1950.82, 1927.11, 1941.28, 1904.01, 1886.76, 1862.76, 1862.49] algdays= [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14] algdates= ['2014-11-03', '2014-10-31', '2014-10-30', '2014-10-29', '2014-10-28', '2014-10-27', '2014-10-24', '2014-10-23', '2014-10-22', '2014-10-21', '2014-10-20', '2014-10-17', '2014-10-16', '2014-10-15'] Alg= ['mydiv', 'AAPL', 'Close', 'mytimes', 'CVX', 'Close', 'IBM', 'Low'] eval= [0.005733897951417679, 0.005502842906216424, 0.005599304593376804, 0.005630003827185758, 0.00563223202811958, 0.005647618015685473, 0.005622627887064053, 0.005585174616171489, 0.0056078277630762195, 0.005506844910953022, 0.005367981104405183, 0.004847484162171896, 0.004850515976282439, 0.0049938530399773196] algdays= [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14] algdates= ['2014-11-03', '2014-10-31', '2014-10-30', '2014-10-29', '2014-10-28', '2014-10-27', '2014-10-24', '2014-10-23', '2014-10-22', '2014-10-21', '2014-10-20', '2014-10-17', '2014-10-16', '2014-10-15']
Two things to note about this code. First, it assumes algs are eight items long. This could be extended; easy as pie to make an algorithm of length eleven items. Second, it only allows a subset of values, ignoring "Volume" and "Adjusted Close." These can be changed.
Next, we'll examine how to evaluate the accuracy of a Predicto algorithm at forecasting the stock market. What this means is making a list of evaluations of the algorithm such as eval= [0.005733897951417679, 0.005502842906216424, 0.005599304593376804, 0.005630003827185758, 0.00563223202811958, 0.005647618015685473, 0.005622627887064053, 0.005585174616171489, 0.0056078277630762195, 0.005506844910953022, 0.005367981104405183, 0.004847484162171896, 0.004850515976282439, 0.0049938530399773196] and then comparing that with a list of next-day changes in the S&P500.
11/08/2014
With a free weekend, set out to do the hard work of explaining how Predicto algorithms are solved. But first, I face a problem I faced when I did the program in Python 2:
# import msvcrt while not msvcrt.kbhit(): print('.', end='') print("kbhit!")
This program run in some Python setups but not in others. Here's a question I asked on the genius site stackoverflow.com:
This simple program works when entered from a command line, but not in IDLE. # import msvcrt while not msvcrt.kbhit(): print('.', end='') print("kbhit!") I had hoped this problem would disappear in Python 3. Some questions: Is there a way to get around this in IDLE? Is there some other Python code editor that does not have this limitation? I am writing a blog on my transition to Python3 and I would like to explain WHY I have to transfer my work from IDLE to a command line. What is the simple, boxtop explanation of this apparent inconsistency, suitable for passing on to readers who don't know Python? Thanks in advance.
Generally, the response was that there was no way around the mismatch between IDLE and Python: "You may want to know, that IDLE is a layered application, that uses Tcl/Tk based Tkinter Controller part, that also scans and evaluates keyboard-related events in the .mainloop() and that simply gets into conflict with your intention to detect .kbhit()." Tell your readers that it is due to an "un-avoidable collision of two concurrent Controllers ( a .mainloop() + .kbdhit() ) [and]DLE's .mainloop() won in your demonstrated example :o)" Will look into an alternative to IDLE, called
11/06/2014
Made some improvements in utility program ONEALGREC1.PY, which shows some details of individual algorithms. Here's run on three high and three low algorithms.
>>> ================================ RESTART ================================ >>> Which alg?179 Alg 179 ['myplus', 'F', 'Low', 'myminus', 'CVX', 'High', 'SP500', 'Close'] ('Next day ges=', '0.485') AvgCor= 0.51204 AllCor= 0.14882 - - C - C X C - C - C - X - X X C C C C X X - X - X C C X X - C - C X X C C X - C - C X - X C X ('Signifigant Accuracy=', 18, '/', 34) SigAcc= 52.94 C X C X C X C X C X C X X C X X C C C C X X C X C X C C X X C C C C X X C C X C C C C X C X C X ('All Accuracy=', 27, '/', 48) AllAcc= 56.25 >>> ================================ RESTART ================================ >>> Which alg?59 Alg 59 ['myminus', 'XOM', 'Close', 'mytimes', 'SP500', 'Close', 'GE', 'High'] ('Next day ges=', '0.448') AvgCor= 0.37327 AllCor= 0.19036 - - C - C C C - C - C - X - C X C C C C X C - C - C C C X X - C - X X X C C X - X - C C - X C X ('Signifigant Accuracy=', 22, '/', 34) SigAcc= 64.71 C X C C C C C X C X C X X C C X C C C C X C C C C C C C X X C C C X X X C C X C X C C C C X C X ('All Accuracy=', 32, '/', 48) AllAcc= 66.67 >>> ================================ RESTART ================================ >>> Which alg?51 Alg 51 ['mydiv', 'CVX', 'Low', 'myminus', 'IBM', 'Low', 'T', 'Low'] ('Next day ges=', '0.562') AvgCor= 0.00827 AllCor= 0.41746 - - X - C X X - C - X - X - X C C C C C C C - X - X C C C C - C - C C C X C X - X - C X - X X C ('Signifigant Accuracy=', 20, '/', 34) SigAcc= 58.82 C C X X C X X X C C X C X C X C C C C C C C C X C X C C C C X C C C C C X C X C X X C X X X X C ('All Accuracy=', 29, '/', 48) AllAcc= 60.42 >>> ================================ RESTART ================================ >>> Which alg?113 Alg 113 ['myplus', 'F', 'Close', 'myplus', 'IBM', 'Close', 'SP500', 'Close'] ('Next day ges=', '0.515') AvgCor= -0.49108 AllCor= -0.16979 - - X - X C X - X - X - C - C C X X X X C C - X - C X X C C - X - X C C X X C - X - X C - C X C ('Signifigant Accuracy=', 15, '/', 34) SigAcc= 44.12 X C X C X C X C X C X C C X C C X X X X C C X X X C X X C C X X X X C C X X C X X X X C X C X C ('All Accuracy=', 20, '/', 48) AllAcc= 41.67 >>> ================================ RESTART ================================ >>> Which alg?79 Alg 79 ['mydiv', 'AAPL', 'Open', 'myminus', 'SP500', 'Close', 'SP500', 'Low'] ('Next day ges=', '0.484') AvgCor= -0.06165 AllCor= -0.01654 - - C - C X C - X - X - C - X X X X X X X X - X - X C C X X - C - X X X X C X - C - C X - X X C ('Signifigant Accuracy=', 11, '/', 34) SigAcc= 32.35 C X C C C X C X X X X X C X X X X X X X X X X X X X C C X X X C X X X X X C X X C C C X X X X C ('All Accuracy=', 14, '/', 48) AllAcc= 29.17 >>> ================================ RESTART ================================ >>> Which alg?111 Alg 111 ['mytimes', 'IBM', 'High', 'mydiv', 'WMT', 'Low', 'XOM', 'Low'] ('Next day ges=', '0.476') AvgCor= -0.12480 AllCor= -0.41939 - - X - C X X - C - C - X - X X X X C C C X - X - C X X X X - X - X X X C X X - C - C C - C C X ('Signifigant Accuracy=', 13, '/', 34) SigAcc= 38.24 X X X X C X X C C C C X X X X X X X C C C X C X X C X X X X X X X X X X C X X X C C C C C C C X ('All Accuracy=', 18, '/', 48) AllAcc= 37.50 >>>
11/05/2014
Now, to skip ahead a bit: I described how update.py created a data matrix of securities prices, downloaded from YAHOO Finance. What is interesting about this situation is that in later stages, I create "algorithms" or "formulae" based on that data.
The Python 2.7.4 version of Predicto has a pool of 277 algorithms. Each algorithm comes up with a daily forecast of whether the stock market (S&P500) is going to go up or down. We keep track of these algorithms in a file called RR6.REC. Measures called hicor, hiacc, and hiallcor keep track of the algorithms that have been doing the best job of forecasting the market. The symmetrical measures beginning with lo- keep track of the algorithms that have been doing the worst job of forecasting the market.
+hicor= 0.517 hicorn= 179 Ges= 0.504 +hiacc= 67.39 pct hiaccn= 59 hiaccn Ges= 0.515 +hiallcor= 0.423 hiallcorn= 51 Ges= 0.481 -locor= -0.493 locorn= 113 Ges= 0.485 -loacc= 28.26 pct loaccn= 79 loaccn Ges= 0.444 -loallcor= -0.447 loallcorn= 111 Ges= 0.537
For instance, I am struck by the behavior of Algorithm 79, (AAPL Open) / (SP500 Close - SP500 Low). which out of 46 tries, has been wrong 33 times, or 72% of the time. This kind of inaccuracy can be useful: just reverse the forecast for the day.
I have a Python2 utility called onealgrec, which gives information about one algorithm:
AvgCor= -0.04932 AllCor= 0.00291 13 / 46 Acc= 28.26 ('Next day ges=', '0.444') Alg 79 ['mydiv', 'AAPL', 'Open', 'myminus', 'SP500', 'Close', 'SP500', 'Low']
Did a second part of the Python2 to Python 3 transfer, a utility program, to access data called getdatum.py. It is less polished than previous update.py. Its output is as follows:
0 SP500 1 AAPL 2 CVX 3 F 4 GE 5 IBM 6 T 7 WMT 8 XOM Enter input number of security you want to study: 4 You chose GE 13303 available dates 2014-11-03 to 1962-01-02 Enter date in form YYYY-MM-DD: 2014-10-29 wantdatestr= 2014-10-29 Found Wednesday 2014-10-29 in line 4 2014-10-29,25.88,25.90,25.39,25.66,28776900,25.66 GE Date = 2014-10-29 Open = 25.88 High = 25.90 Low = 25.39 Close = 25.66 Volume = 28776900 Adj Close = 25.66
Here is the program:
# import datetime datadirloc="c:\\Python34\\predata\\" datasuf=".csv" tickers=["SP500","AAPL","CVX","F","GE","IBM","T","WMT","XOM"] week=['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday'] for i,ticker in enumerate(tickers): print("{0:2d} {1:s}".format(i, ticker)) tickernum=input("Enter input number of security you want to study: ") ticker=tickers[int(tickernum)] print("You chose {0}".format(ticker)) fn=datadirloc+ticker+datasuf f=open(fn, 'rt') l=f.readlines() f.close() dl=l[0] dsl=dl.split(',') topdatestr=l[1].rstrip().split(',')[0] botdatestr=l[-1].rstrip().split(',')[0] print("{0:d} available dates {1:s} to {2:s}".format(len(l), topdatestr, botdatestr)) wantdatestr=input("Enter date in form YYYY-MM-DD: ") print("wantdatestr=", wantdatestr) wantdt=datetime.datetime.strptime(wantdatestr, "%Y-%m-%d") wdw=wantdt.weekday() found=False fsl=() for index, line in enumerate(l): if wantdatestr in line: print("Found {0} {1} in line {2}\n {3}".format(week[wdw], wantdatestr, index, line)) fsl=line.split(',') found=True elif index=len(): print("Not Found") if found: print(ticker) for i in range(len(fsl)): print(" {0:9s} = {1}".format(dsl[i].rstrip(), fsl[i]))
11/03/2014
In first move, transferred old UPDATE.PY function from Python27 to Python34. Seems to work, producing CSV (comma-separated-value) text files. Had to change import file list and all print functions. Especially noteworthy is the use of the ".format" string, especially:
print("{0:5s} MktDay= {1:s} Len= {2:5d} Ch={3:5.2f}".format(ticker, sl1[0], len(l), ch))The items in curly brackets are numbered placeholders and format specifieers for values to be specified between parentheses in the ".format()". This was an awkward syntax to me -- especially since the distinction between curly brackets and parentheses is slight on my screen. But maybe I have got it through my thick head now....
import urllib.request tickers=["SP500","AAPL","CVX","F","GE","IBM","T","WMT","XOM"] datadirloc="c:\\Python34\\predata\\" datasuf=".csv" base_url = "http://ichart.finance.yahoo.com/table.csv?s=" print("1. Update") for ticker in tickers: print("{0:s}".format(ticker)) if ticker=="SP500": nuticker="^GSPC" else: nuticker=ticker ytarg=base_url+nuticker outfile=datadirloc+ticker+datasuf out=urllib.request.urlretrieve(ytarg,outfile) print("2. Checkupdate") for ticker in tickers: fn=datadirloc+ticker+datasuf f=open(fn, 'rt') l=f.readlines() sl1=l[1].rstrip().split(',') sl0=l[2].rstrip().split(',') ch=float(sl1[4])-float(sl0[4]) print("{0:5s} MktDay= {1:s} Len= {2:5d} Ch={3:5.2f}".format(ticker, sl1[0], len(l), ch)) f.close()
The first part of the program creates nine files in the predata directory, one for each of S&P500, Apple, Chevron, Ford, GE, IBM, AT&T, Walmart, and Exxon. These files are in the following form (using the example of AAPL:
Date,Open,High,Low,Close,Volume,Adj Close 2014-11-03,108.22,110.30,108.01,109.40,52198000,109.40 2014-10-31,108.01,108.04,107.21,108.00,44571200,108.00 2014-10-30,106.96,107.35,105.90,106.98,40589700,106.98 2014-10-29,106.65,107.37,106.36,107.34,52586100,107.34 2014-10-28,105.40,106.74,105.35,106.74,47939900,106.74 2014-10-27,104.85,105.48,104.70,105.11,34132600,105.11 2014-10-24,105.18,105.49,104.53,105.22,46981700,105.22 2014-10-23,104.08,105.05,103.63,104.83,71002900,104.83 2014-10-22,102.84,104.11,102.60,102.99,68159000,102.99 ... 1980-12-16,25.37,25.37,25.25,25.25,26432000,0.39 1980-12-15,27.38,27.38,27.25,27.25,43971200,0.42 1980-12-12,28.75,28.87,28.75,28.75,117258400,0.45The second part of the program checks the update, as follows:
2. Checkupdate SP500 MktDay= 2014-11-03 Len= 16316 Ch=-0.24 AAPL MktDay= 2014-11-03 Len= 8549 Ch= 1.40 CVX MktDay= 2014-11-03 Len= 11316 Ch=-3.17 F MktDay= 2014-11-03 Len= 10704 Ch=-0.10 GE MktDay= 2014-11-03 Len= 13303 Ch=-0.11 IBM MktDay= 2014-11-03 Len= 13303 Ch=-0.04 T MktDay= 2014-11-03 Len= 7640 Ch= 0.00 WMT MktDay= 2014-11-03 Len= 10644 Ch= 0.01 XOM MktDay= 2014-11-03 Len= 11316 Ch=-1.45
11/01/2014
Have actually written a partial Predicto implementation, in Python. Finished it 9/2/2014, but not sure how to proceed. In a lazy Saturday devoted to binge-TV-watching, came up with a plan to cover Predicto by transfering it from its current Python 2.7.4 version to the latest Python 3.4.0. See links below.
Python 3.0 Transfer
Some sources on new Python 3.4.0: