First steps¶
The xlsx_streaming
library is primarily meant to be used with streaming HTTP response objects.
It provides an API to stream data extracted from a database directly into an xlsx document.
This library does not depend on any web framework or any python excel library. However it was primarily designed for django and has not been tested with any other web framework yet.
Django example¶
from django.http import StreamingHttpResponse
import xlsx_streaming
def my_view(request):
qs = MyModel.objects.all().values_list('field1', 'field2', 'field3')
with open('template.xlsx', 'rb') as template:
stream = xlsx_streaming.stream_queryset_as_xlsx(
qs,
template,
batch_size=50
)
response = StreamingHttpResponse(
stream,
content_type='application/vnd.xlsxformats-officedocument.spreadsheetml.sheet',
)
response['Content-Disposition'] = 'attachment; filename=test.xlsx'
return response
With this kind of code, your database will not be hit too hard by the file export, even when the export’s size is several dozens of megabytes.
To include type information (for dates, numbers, …) in the generated Excel
file, xlsx_streaming
uses a user-provided template file containing example
data whose column types match those of the exported data. Like in the example
above, you can use a static template, but you can also generate the template
dynamically using the python excel library of your choice (openpyxl, …). The
template must be a valid Excel file with at least one row. If the template
contains only one row, this row cell’s datatypes are used as datatypes for all
generated rows. If the template contains more than one row, the first row is
kept as is (header row) and the second row cell’s datatypes are used as
datatypes for all generated rows. The template must be open as binary.
Built-in serialization¶
The xlsx_streaming
library provides builtin support for numerical, date/datetime/timedelta, and boolean conversion to excel equivalent.
All other data types are converted to text by default.
Using custom serializers¶
If you want to preprocess the data from the database before inserting it in the xlsx document, you can pass a serializer argument to stream_queryset_as_xlsx
.
This function is applied to each queryset slice before writing the data to excel.
For example, with a Django Rest Framework serializer:
class MySerializer(serializers.Serializer):
…
serializer = lambda x: [d.values() for d in MySerializer(x, many=True).data]
xlsx_streaming.stream_queryset_as_xlsx(qs, template, serializer=serializer)
Specifying the timezone of the export¶
The datetime objects might be stored as ‘UTC’ inside your database, but you may want to see datetime objects exported in a specific timezone in the excel document.
Excel does not offer support for timezone. However, the xlsx_streaming
library lets you specify the timezone to be used for exports (by default the timezone is kept unchanged).
Before writing a datetime, it will first be localized in the specified timezone before being converted to a naive datetime.
This functionality requires the pytz library.
For example, if you set:
from pytz import timezone
xlsx_streaming.set_export_timezone(timezone('US/Eastern'))
all the datetimes written with the xlsx_streaming
library will first be localized in the ‘US/Eastern’ timezone.