Tangos is a system for building and querying databases summarising the results of numerical galaxy simulations.
Learn more in the following categories:
Tangos is built on sqlalchemy, which means that it is in principle possible to use any database system supported by sqlalchemy. However, different database systems have different features and limitations of which it is worth being aware.
The tangos tests are run with SQLite, mySQL and postgresql. Other databases, while supported by sqlalchemy, have not been directly tested. The following contain some notes on using these different systems.
SQLite is the default database. It is simple in the sense that it keeps your entire database within a single file which can easily be transferred to different systems. Additionally, the SQLite driver is included with Python and so it’s quick to get started.
There are two major, related drawbacks to SQLite. The first is that to access it one must copy over
the file, and there is no automated way to keep files synchronised between hosts. (Probably the best
thing to do is to write to the database only on one cluster, and then rsync
it to the relevant
analysis machines.) The second is that it is not really designed for parallel writes, so when tangos
is writing to the database it must manually try to synchronise writes between different workers.
Tangos does a pretty good job of this, but some network file systems can be slow at releasing file
locks that SQLite uses extensively. If you run into errors about ‘database is locked’, you have reached
the limit of how many tangos processes can safely write to SQLite simultaneously.
PostgreSQL and MySQL are both server-based systems, and as such take a little more effort to set up and maintain. If one exposes PostgreSQL to the outside world, there are potential security implications. One can of course run it on a firewalled computer and manage access appropriately, but this takes some expertise of its own (that will not be covered here). The major advantage is that you can host your data in a single location and allow multiple users to connect.
MySQL is a server-based system, and as such takes a little more effort to set up. The advantage is that you can host your data in a single location and allow multiple users to connect. Additionally, it is able to cope much better with complex parallel writes than SQLite.
For most users, MySQL and PostgreSQL are
To try this out, if you have docker, you can run a test MySQL server very easily:
docker pull mysql
docker run -d --name=mysql-server -p3306:3306 -e MYSQL_ROOT_PASSWORD=my_secret_password mysql
echo "create database database_name;" | docker exec -i mysql-server mysql -pmy_secret_password
Or, just as easily, you can get going with PostgreSQL:
docker pull postgres
docker run --name tangos-postgres -e POSTGRES_USER=tangos -e POSTGRES_PASSWORD=my_secret_password -e POSTGRES_DB=database_name -p 5432:5432 -d postgres
To be sure that python can connect to MySQL or PostgreSQL, install the appropriate modules:
pip install PyMySQL # for MySQL
pip install psycopg2-binary # for PostgreSQL
Tangos can now connect to your test MySQL server using the connection:
export TANGOS_DB_CONNECTION=mysql+pymysql://root:my_secret_password@localhost:3306/database_name
or for PostgreSQL:
export TANGOS_DB_CONNECTION=postgresql+psycopg2://tangos:my_secret_password@localhost/database_name
You can now create new users that can access your mysql server with their own username and password.
echo "create user 'my_new_user'@'%' identified by 'new_password';" | docker exec -i mysql-server mysql -pmy_secret_password
Note that in MySQL the %
acts as a wild card, so this command creates a new user
logging in from any host.
The new user would then connect to the database:
export TANGOS_DB_CONNECTION=mysql+pymysql://my_new_user:new_password@localhost:3306/database_name
The database can be accessed remotely if allowed by any applicable firewalls, by replacing localhost
with the actual host like fancy_computer.astro.fancy_school.edu
. Note, however, that
running a database server open to the world has security implications and may be disallowed by
relevant institutions. The simplest approach, rather than opening up firewalls, is to tunnel in.
For example, the server can be accessed as though it’s running on localhost
if the user
first ssh tunnels into fancy_computer.astro.fancy_school.edu
:
ssh -N -f -L localhost:3306:localhost:3306 my_username@fancy_computer.astro.fancy_school.edu
Note that new users will by default only be able to view a database. Granting additional permissions should be done on a case-by-case basis. Only the root user can do this by defualt. To give a user complete permission to edit an existing database:
echo "grant all on database_name.* to 'new_user'@'%';" | docker exec -i mysql-server mysql -pmy_secret_password
echo "flush privileges;" | docker exec -i mysql-server mysql -pmy_secret_password
You (and whatever users you choose) can now use all the tangos tools as normal, and they will populate the MySQL/PostgreSQL database instead of a SQLite file.