Like this tutorial? Sign up for the mailing list to be notified of site updates and an upcoming course on Python For Traders.
Retrieving daily historical prices for a given stock symbol is very straightforward using Python’s yfinance package. Simply install the package with pip:
pip install yfinance
Then it is only a few lines of code to download the daily price history for Apple (AAPL) from January 1st, 2020 through October 2, 2020. The code below will download AAPL price history, store the result in a Pandas DataFrame, and save the data in CSV format to a file named AAPL.csv.
import yfinance df = yfinance.download('AAPL', start='2020-01-01', end='2020-10-02') df.to_csv('AAPL.csv')
SQL Database vs. CSV File
CSV files are useful for simple cases, but are not often used as a storage solution for a complex application for several reasons.
Ease of Management
What if we want to download the price history for thousands of stocks? Do we want to save thousands of files to our hard drive? Possibly, but this can get difficult to manage. What if we want to update these files each day with the latest prices? Would we download the entire price history again? Or just download a single day and append the price to each of our files? What if one of the stock symbols changes or is delisted? Do we rename the old symbol’s file or keep it? Do we delete the file and create a new one?
What if we want to build a technical or fundamental screener? Would we want to open thousands of files and read through them line-by-line each time our user clicks a button on the web page? Possibly. But it might be easier to store all of this data in a SQL database. When using SQLite, we can store all of our data and relationships between that data in a single file.
What if later we want to store other key statistics for a particular stock, such as quarterly earnings or fundamental data? We would need to create yet another file to store this data and store redundant copies of data such as the stock symbol name again. Or we would need to go back to all of our existing CSV files, add new columns, and backfill this data. With SQL, it is much easier to add a new column to an existing table or create a new table that references our existing stock table and avoid any redundancy.
What if we want to store news articles or tweets that mention a given stock symbol? What if we want to build a web or mobile application that allows users to follow specific stocks, save them to a watchlist, or receive notifications on conditions related to these stocks? What if a stock that everyone is following gets delisted?
With a relational database, we can manage our stock symbols in a single central table that is referenced by other tables. So we can store our price records, our watchlists, and our trades in separate tables that reference the central stock symbol table. Then, when a stock symbol is changed, we only need to change it in one place. When a stock symbol is deleted, we have the option of automatically deleting all records in other tables that reference that stock without writing any additional code. A relational database eliminates redundancy and makes our data consistent without writing additional code.
Indexing and Aggregation
What if we only want to select stock prices for a specific week? Or we want to view our historical price data aggregated in 15 minute candles? Or we want to analyze data for stocks that are holdings in a particular ETF? Or what if we want seasonality data and want to see if a stock performs better in certain months? Sure, we could read all of our CSV files into pandas, filter this data and create data structures to perform these tasks. But SQL has powerful expressions that let us query for very specific pieces of data with a WHERE clause, JOIN that data with other tables, and aggregate this data using GROUP BY, ORDER BY, COUNT, AVG, and SUM. This allows us to return only the exact pieces of data we are interested in.
For the reasons mentioned above, we might want to define a more complex structure that can handle thousands of stocks and their relationships to other entities. In this tutorial, we will design a simple database for storing stock symbols and their associated price histories.
In this tutorial, we will use SQLite as our database. We will use this database for a few reasons:
- Simplicity. The SQLite library is already built into the Python language. There are no dependencies to install. There is no server to configure.
- Speed. SQLite is very fast for SELECTing data and our system will not have many users inserting data concurrently.
Creating a Database
Let’s get started by downloading the SQLite command line executable. Once we have the executable, we can create a new database from the command line running the sqlite3 command followed by the name of the database file. If the database file does not exist, a new database will be created. Otherwise SQLite will open the existing database file.
Creating a Stock Symbol Table
Once we have created our database, we can create some tables to store data. Here is the SQL to create a table. This table will hold our stock records, including the stock symbol and the name of the company.
CREATE TABLE IF NOT EXISTS stock ( id INTEGER PRIMARY KEY, symbol TEXT NOT NULL UNIQUE, company TEXT NOT NULL );
In SQL, the CREATE TABLE statement does exactly what it sounds like. It creates a new table. So far, this table is just a container for data. It has no rows of data yet. We have only defined the structure by telling SQLite the names of the columns and their data types.
To start, each stock record will have a symbol (eg. ‘AAPL’) and company (eg. ‘Apple’). We can add additional columns over time as our application gets more complex. Since the values we will put in these columns are strings of text, we use the TEXT data type. Since these values are required, we specify that these values are NOT NULL.
Primary and Foreign Keys
Each table in a relational database should have a primary key. A primary key allows us to uniquely identify a record in a table. In the table above, we create a column named id and specify that it is the PRIMARY KEY in this table. SQLite will automatically generate a new unique number that identifies each new row in our database. When we create new tables that are related to our stocks, we can define a FOREIGN KEY column that references the id column our stock table.
Creating a Stock Price Table
Now that we have a table for storing our stock symbols and company names, how would we go about storing the price data? Our first thought might be to add a price column to our stock table. However, we are interested in storing and analyzing historical prices. For our purposes, there are actually many prices for each stock. We are interested in storing the Open, High, Low, and Closing price for each day (or hour, or minute). So let’s create a table that stores many rows of price data. Each of these rows can reference a single stock record.
CREATE TABLE IF NOT EXISTS stock_price ( id INTEGER PRIMARY KEY, stock_id INTEGER, date NOT NULL, open NOT NULL, high NOT NULL, low NOT NULL, close NOT NULL, adjusted_close NOT NULL, volume NOT NULL, FOREIGN KEY (stock_id) REFERENCES stock (id) );
Notice that we define a column called stock_id. At the bottom of our SQL statement, we specify FOREIGN KEY (stock_id) REFERENCES stock (id). This means that the value of the stock_id column in the stock_price table will reference the unique number that identifies a row in the stock table. Let’s draw a diagram of this relationship and some example records to demonstrate:
We have discussed the benefits of storing our data in a relational database. We have installed SQLite and designed a simple database. We have created the tables using the SQLite command line. However, our database tables are only containers for records at this point. The tables are still empty. In the next part of the series, we will learn how to INSERT, SELECT, UPDATE, and DELETE records using SQL.