Pagination and Data Format
1. Pagination
Our API uses cursor-based pagination for all endpoints. To retrieve the next page of results for a given endpoint, use thecursor value provided in the response of the previous request.
2. Data Format
All data returned by the API is provided in a compressed JSON Lines format, typically in.gz compression. You do not need to extract these files beforehand, as libraries such as pandas can handle reading compressed files directly.
3. Processing Large Files
Due to the large volume of data, it is recommended to read the data in chunks to optimize performance. Below is an example using Python and thepandas library to process a compressed JSON Lines file:
read_jsonl_file.py
- Compression Handling: The
pandaslibrary can read compressed files directly, so files do not need to be manually extracted before processing. - Reading in Chunks: To handle large files efficiently, the data is read in manageable chunks using the
chunksizeparameter. The chunk size can be adjusted based on the client’s memory and performance requirements.