Welcome to xlsx_streaming’s documentation!¶
xlsx_streaming
is a library to stream an xlsx_document from a collection of rows of data
(typically a queryset containing database data).
Contents:
First steps¶
The xlsx_streaming
library is primarily meant to be used with streaming HTTP response objects.
It provides an API to stream data extracted from a database directly into an xlsx document.
This library does not depend on any web framework or any python excel library. However it was primarily designed for django and has not been tested with any other web framework yet.
Django example¶
from django.http import StreamingHttpResponse
import xlsx_streaming
def my_view(request):
qs = MyModel.objects.all().values_list('field1', 'field2', 'field3')
with open('template.xlsx', 'rb') as template:
stream = xlsx_streaming.stream_queryset_as_xlsx(
qs,
template,
batch_size=50
)
response = StreamingHttpResponse(
stream,
content_type='application/vnd.xlsxformats-officedocument.spreadsheetml.sheet',
)
response['Content-Disposition'] = 'attachment; filename=test.xlsx'
return response
With this kind of code, your database will not be hit too hard by the file export, even when the export’s size is several dozens of megabytes.
To include type information (for dates, numbers, …) in the generated Excel
file, xlsx_streaming
uses a user-provided template file containing example
data whose column types match those of the exported data. Like in the example
above, you can use a static template, but you can also generate the template
dynamically using the python excel library of your choice (openpyxl, …). The
template must be a valid Excel file with at least one row. If the template
contains only one row, this row cell’s datatypes are used as datatypes for all
generated rows. If the template contains more than one row, the first row is
kept as is (header row) and the second row cell’s datatypes are used as
datatypes for all generated rows. The template must be open as binary.
Built-in serialization¶
The xlsx_streaming
library provides builtin support for numerical, date/datetime/timedelta, and boolean conversion to excel equivalent.
All other data types are converted to text by default.
Using custom serializers¶
If you want to preprocess the data from the database before inserting it in the xlsx document, you can pass a serializer argument to stream_queryset_as_xlsx
.
This function is applied to each queryset slice before writing the data to excel.
For example, with a Django Rest Framework serializer:
class MySerializer(serializers.Serializer):
…
serializer = lambda x: [d.values() for d in MySerializer(x, many=True).data]
xlsx_streaming.stream_queryset_as_xlsx(qs, template, serializer=serializer)
Specifying the timezone of the export¶
The datetime objects might be stored as ‘UTC’ inside your database, but you may want to see datetime objects exported in a specific timezone in the excel document.
Excel does not offer support for timezone. However, the xlsx_streaming
library lets you specify the timezone to be used for exports (by default the timezone is kept unchanged).
Before writing a datetime, it will first be localized in the specified timezone before being converted to a naive datetime.
This functionality requires the pytz library.
For example, if you set:
from pytz import timezone
xlsx_streaming.set_export_timezone(timezone('US/Eastern'))
all the datetimes written with the xlsx_streaming
library will first be localized in the ‘US/Eastern’ timezone.
Reference¶
-
xlsx_streaming.
stream_queryset_as_xlsx
(qs, xlsx_template=None, serializer=None, batch_size=1000, encoding='utf-8')¶ Iterate over qs by batch (typically a Django queryset) and stream the bytes of the xlsx document generated from the data.
This function can typically be used to stream data extracted from a database by batch, making many small database requests instead of one big request which could timeout.
Parameters: - qs (Iterable) – an iterable containing the rows (typically a Django queryset)
- xlsx_template (Optional[BytesIO]) – an in memory xlsx file template containing the header (optional) and the first row used to infer data types for each column. If not provided, all cells will be formatted as text.
- serializer (Optional[Callable]) – a function applied to each batch of rows to transform them before saving them to the xlsx document (defaults to identity).
- batch_size (Optional[int]) – the size of each batch of rows
- encoding (Optional[str]) – the file encoding
Returns: A streamable xlsx file
Return type: Iterable
Note
If the xlsx template contains more than one row, the first row is kept as is in the final xlsx file (header row), and the second one is used as a template for all the generated rows.
Changelog¶
1.2.0 (unreleased)¶
- Nothing changed yet.
1.1.0 (2021-06-23)¶
render_worksheet()
now allows rendering worksheet with iterators.
1.0.0 (2021-03-01)¶
stream_queryset_as_xlsx()
now accepts an iterator.- backward incompatibility Drop support of Python 2, 3.4 and 3.5. xlxs_streaming now supports only Python 3.6 and onward.
0.4.1 (2019-11-07)¶
- Fix bug in
stream_queryset_as_xlsx()
that was introduced in version 0.4.0. When called with a Django QuerySet object (or something similar), the function made a single SQL query, instead of multiple SQL queries depending on thebatch_size
argument. This bug may cause performance issues when fetching many rows from the database.
0.4.0 (2019-11-06)¶
stream_queryset_as_xlsx()
now accepts a generator. [Hugo Lecuyer].
0.3.1 (2017-02-22)¶
- First sheet of a workbook is now more reliably detected
0.3.0 (2017-02-22)¶
- It is now possible to stream data to an Excel document without providing a template. In this case all cells are formatted as text (no date or number formatting). If there is an error with the template provided by the user, fall back to this behaviour.
- Improve error handling (try to recover and log instead of raise)
- Change license
0.2.0 (2015-12-16)¶
- Add python 2.6 compatibility
0.1.0 (2015-12-16)¶
- First public version.