SoatDev IT Consulting
SoatDev IT Consulting
  • About us
  • Expertise
  • Services
  • How it works
  • Contact Us
  • News
  • June 23, 2023
  • Rss Fetcher
Image by Sixteen Miles Out on Unsplash

How do you work with Amazon S3 in Polars? Amazon S3 bucket is one of the most common object stores for data projects. Polars is a fairly new technology. There are only a few resources that explain how to work with S3.

In this article, I’ll walk you through reading from and writing to an S3 bucket in Polars, specifically CSV and parquet files.

You can see my full code in my GitHub repo.

Read From S3 in Polars

Let’s say you have a file containing information like this in an S3 bucket (I got this example data from a book called “Data Pipelines Pocket Reference”):

three columns, six rows columns (order_ id, status, datetime) rows: 1, back ordered, date and time 1, shipped, date and time 2, shipped, date and time 1, shipped, date and time 3, shipped, date and time

There are two ways I’ve found you can read from S3 in Polars. One way is what’s introduced in the Polars documentation. Another way is to make it so that you simply read from a S3 file system, just like in your local file system, using code like “with open()…”.

You need two other libraries for the first approach, s3fs and pyarrow. What you’d do is read a file in S3 through s3fs as pyarrow dataset. Then you convert it to a Polars dataframe (Make sure you have the necessary configurations for s3fs to work, such as setting up and specifying the IAM profile for AWS).

You can use the piece of code from Polars’ documentation, which utilizes .from_arrow(), but I modified it a little bit so that it gets the data as lazyframe by using .scan_pyarrow_dataset(). I also made it so that you use dataset.Dataset() instead of parquet.ParquetDataset() to be able to specify the file format.

Parquet

import polars as pl
import pyarrow.dataset as ds
import s3fs
from config import BUCKET_NAME

# set up
fs = s3fs.S3FileSystem(profile='s3_full_access')

# read parquet
dataset = ds.dataset(f"s3://{BUCKET_NAME}/order_extract.parquet", filesystem=fs, format='parquet')
df_parquet = pl.scan_pyarrow_dataset(dataset)
print(df_parquet.collect().head())

To read a CSV file, you just change format=‘parquet’ to format=‘csv’.

Another way is rather simpler. You’re just reading a file in binary from a filesystem.

import polars as pl
import s3fs
from config import BUCKET_NAME

# set up
fs = s3fs.S3FileSystem(profile='s3_full_access')

# read parquet 2
with fs.open(f'{BUCKET_NAME}/order_extract.parquet', mode='rb') as f:
print(pl.read_parquet(f).head())

Write to S3 in Polars

To write to S3, you’ll want to take the second approach explained above. So the only dependency you need is the s3fs library.

import polars as pl
import s3fs
from config import BUCKET_NAME

# prep df
df = pl.DataFrame({
'ID': [1,2,3,4],
'Direction': ['up', 'down', 'right', 'left']
})

# set up
fs = s3fs.S3FileSystem(profile='s3_full_access')

# write parquet
with fs.open(f'{BUCKET_NAME}/direction.parquet', mode='wb') as f:
df.write_parquet(f)

Summary

I hope this article gives you an idea of how you can work with files in the S3 bucket. Please reach out if you know better or more efficient ways to read from and write to S3 in Polars!

Here’s the link to the GitHub repo.

References

  • https://pola-rs.github.io/polars-book/user-guide/io/aws/
  • https://medium.com/@louis_10840/how-to-process-data-stored-in-amazon-s3-using-polars-2305bf064c52
  • https://stackoverflow.com/questions/75115246/with-python-is-there-a-way-to-load-a-polars-dataframe-directly-into-an-s3-bucke

Originally published at https://stuffbyyuki.com on June 18, 2023.


Read from and Write to Amazon S3 in Polars was originally published in Better Programming on Medium, where people are continuing the conversation by highlighting and responding to this story.

Previous Post
Next Post

Recent Posts

  • Computing the Euler-Mascheroni Constant
  • Golden ratio base numbers
  • Pioneering Apple engineer Bill Atkinson dies at 74
  • Lawyers could face ‘severe’ penalties for fake AI-generated citations, UK court warns
  • At the Bitcoin Conference, the Republicans were for sale

Categories

  • Industry News
  • Programming
  • RSS Fetched Articles
  • Uncategorized

Archives

  • June 2025
  • May 2025
  • April 2025
  • February 2025
  • January 2025
  • December 2024
  • November 2024
  • October 2024
  • September 2024
  • August 2024
  • July 2024
  • June 2024
  • May 2024
  • April 2024
  • March 2024
  • February 2024
  • January 2024
  • December 2023
  • November 2023
  • October 2023
  • September 2023
  • August 2023
  • July 2023
  • June 2023
  • May 2023
  • April 2023

Tap into the power of Microservices, MVC Architecture, Cloud, Containers, UML, and Scrum methodologies to bolster your project planning, execution, and application development processes.

Solutions

  • IT Consultation
  • Agile Transformation
  • Software Development
  • DevOps & CI/CD

Regions Covered

  • Montreal
  • New York
  • Paris
  • Mauritius
  • Abidjan
  • Dakar

Subscribe to Newsletter

Join our monthly newsletter subscribers to get the latest news and insights.

© Copyright 2023. All Rights Reserved by Soatdev IT Consulting Inc.