tnetstring-0.2.1/0000755000175000017500000000000011763275626013320 5ustar rfkrfk00000000000000tnetstring-0.2.1/README.rst0000644000175000017500000000362211762770316015005 0ustar rfkrfk00000000000000 tnetstring: data serialization using typed netstrings ====================================================== This is a data serialization library. It's a lot like JSON but it uses a new syntax called "typed netstrings" that Zed has proposed for use in the Mongrel2 webserver. It's designed to be simpler and easier to implement than JSON, with a happy consequence of also being faster in many cases. An ordinary netstring is a blob of data prefixed with its length and postfixed with a sanity-checking comma. The string "hello world" encodes like this:: 11:hello world, Typed netstrings add other datatypes by replacing the comma with a type tag. Here's the integer 12345 encoded as a tnetstring:: 5:12345# And here's the list [12345,True,0] which mixes integers and bools:: 19:5:12345#4:true!1:0#] Simple enough? This module gives you the following functions: :dump: dump an object as a tnetstring to a file :dumps: dump an object as a tnetstring to a string :load: load a tnetstring-encoded object from a file :loads: load a tnetstring-encoded object from a string :pop: pop a tnetstring-encoded object from the front of a string Note that since parsing a tnetstring requires reading all the data into memory at once, there's no efficiency gain from using the file-based versions of these functions. They're only here so you can use load() to read precisely one item from a file or socket without consuming any extra data. The tnetstrings specification explicitly states that strings are binary blobs and forbids the use of unicode at the protocol level. As a convenience to python programmers, this library lets you specify an application-level encoding to translate python's unicode strings to and from binary blobs: >>> print repr(tnetstring.loads("2:\xce\xb1,")) '\xce\xb1' >>> >>> print repr(tnetstring.loads("2:\xce\xb1,", "utf8")) u'\u03b1' tnetstring-0.2.1/LICENSE.txt0000644000175000017500000000203611550306062015123 0ustar rfkrfk00000000000000Copyright (c) 2011 Ryan Kelly Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. tnetstring-0.2.1/PKG-INFO0000644000175000017500000000533311763275626014421 0ustar rfkrfk00000000000000Metadata-Version: 1.0 Name: tnetstring Version: 0.2.1 Summary: data serialization using typed netstrings Home-page: http://github.com/rfk/tnetstring Author: Ryan Kelly Author-email: ryan@rfk.id.au License: MIT Description: tnetstring: data serialization using typed netstrings ====================================================== This is a data serialization library. It's a lot like JSON but it uses a new syntax called "typed netstrings" that Zed has proposed for use in the Mongrel2 webserver. It's designed to be simpler and easier to implement than JSON, with a happy consequence of also being faster in many cases. An ordinary netstring is a blob of data prefixed with its length and postfixed with a sanity-checking comma. The string "hello world" encodes like this:: 11:hello world, Typed netstrings add other datatypes by replacing the comma with a type tag. Here's the integer 12345 encoded as a tnetstring:: 5:12345# And here's the list [12345,True,0] which mixes integers and bools:: 19:5:12345#4:true!1:0#] Simple enough? This module gives you the following functions: :dump: dump an object as a tnetstring to a file :dumps: dump an object as a tnetstring to a string :load: load a tnetstring-encoded object from a file :loads: load a tnetstring-encoded object from a string :pop: pop a tnetstring-encoded object from the front of a string Note that since parsing a tnetstring requires reading all the data into memory at once, there's no efficiency gain from using the file-based versions of these functions. They're only here so you can use load() to read precisely one item from a file or socket without consuming any extra data. The tnetstrings specification explicitly states that strings are binary blobs and forbids the use of unicode at the protocol level. As a convenience to python programmers, this library lets you specify an application-level encoding to translate python's unicode strings to and from binary blobs: >>> print repr(tnetstring.loads("2:\xce\xb1,")) '\xce\xb1' >>> >>> print repr(tnetstring.loads("2:\xce\xb1,", "utf8")) u'\u03b1' Keywords: netstring serialize Platform: UNKNOWN Classifier: Programming Language :: Python Classifier: Programming Language :: Python :: 2 Classifier: Development Status :: 4 - Beta Classifier: License :: OSI Approved :: MIT License tnetstring-0.2.1/ChangeLog.txt0000644000175000017500000000106111763273623015701 0ustar rfkrfk00000000000000 v0.2.1: * Fix memory leak in tnetstring.pop(); thanks tarvip. * Fix bug in handling of large integers; thanks gdamjan. v0.2.0: * Easy loading of unicode strings. If you pass an optional "encoding" argument to load/loads/pop then it will return unicode string objects rather than byte strings. * Easy dumping of unicode strings. If you pass an optional "encoding" argument to dump/dumps then it will write unicode strings in that encoding. v0.1.0: * Initial version; you might say *everything* has changed. tnetstring-0.2.1/tnetstring/0000755000175000017500000000000011763275626015521 5ustar rfkrfk00000000000000tnetstring-0.2.1/tnetstring/tns_core.c0000644000175000017500000003040111762770316017472 0ustar rfkrfk00000000000000// // tns_core.c: core code for a tnetstring parser in C // // This is code for parsing and rendering data in the provisional // typed-netstring format proposed for inclusion in Mongrel2. You can // think of it like a JSON library that uses a simpler wire format. // #include "dbg.h" #include "tns_core.h" #ifndef TNS_MAX_LENGTH #define TNS_MAX_LENGTH 999999999 #endif // Current outbuf implementation writes data starting at the back of // the allocated buffer. When finished we simply memmove it to the front. // Here *buffer points to the allocated buffer, while *head points to the // last characer written to the buffer (and thus decreases as we write). struct tns_outbuf_s { char *buffer; char *head; size_t alloc_size; }; // Helper function for parsing a dict; basically parses items in a loop. static int tns_parse_dict(const tns_ops *ops, void *dict, const char *data, size_t len); // Helper function for parsing a list; basically parses items in a loop. static int tns_parse_list(const tns_ops *ops, void *list, const char *data, size_t len); // Helper function for writing the length prefix onto a rendered value. static int tns_outbuf_clamp(tns_outbuf *outbuf, size_t orig_size); // Finalize an outbuf, turning the allocated buffer into a standard // char* array. Can't use the outbuf once it has been finalized. static char* tns_outbuf_finalize(tns_outbuf *outbuf, size_t *len); // Free the memory allocated in an outbuf. // Can't use the outbuf once it has been freed. static void tns_outbuf_free(tns_outbuf *outbuf); // Helper function to read a base-ten integer off a string. // Due to additional constraints, we can do it faster than strtoi. static size_t tns_strtosz(const char *data, size_t len, size_t *sz, char **end); void* tns_parse(const tns_ops *ops, const char *data, size_t len, char **remain) { char *valstr = NULL; tns_type_tag type = tns_tag_null; size_t vallen = 0; // Read the length of the value, and verify that it ends in a colon. check(tns_strtosz(data, len, &vallen, &valstr) != -1, "Not a tnetstring: invalid length prefix."); check(*valstr == ':', "Not a tnetstring: invalid length prefix."); valstr++; check((valstr+vallen) < (data+len), "Not a tnetstring: invalid length prefix."); // Grab the type tag from the end of the value. type = valstr[vallen]; // Output the remainder of the string if necessary. if(remain != NULL) { *remain = valstr + vallen + 1; } // Now dispatch type parsing based on the type tag. return tns_parse_payload(ops, type, valstr, vallen); error: return NULL; } // This appears to be faster than using strncmp to compare // against a small string constant. Ugly but fast. #define STR_EQ_TRUE(s) (s[0]=='t' && s[1]=='r' && s[2]=='u' && s[3]=='e') #define STR_EQ_FALSE(s) (s[0]=='f' && s[1]=='a' && s[2]=='l' \ && s[3]=='s' && s[4] == 'e') void* tns_parse_payload(const tns_ops *ops,tns_type_tag type, const char *data, size_t len) { void *val = NULL; assert(ops != NULL && "ops struct cannot be NULL"); switch(type) { // Primitive type: a string blob. case tns_tag_string: val = ops->parse_string(ops, data, len); check(val != NULL, "Not a tnetstring: invalid string literal."); break; // Primitive type: an integer. case tns_tag_integer: val = ops->parse_integer(ops, data, len); check(val != NULL, "Not a tnetstring: invalid integer literal."); break; // Primitive type: a float. case tns_tag_float: val = ops->parse_float(ops, data, len); check(val != NULL, "Not a tnetstring: invalid float literal."); break; // Primitive type: a boolean. // The only acceptable values are "true" and "false". case tns_tag_bool: if(len == 4 && STR_EQ_TRUE(data)) { val = ops->get_true(ops); } else if(len == 5 && STR_EQ_FALSE(data)) { val = ops->get_false(ops); } else { sentinel("Not a tnetstring: invalid boolean literal."); val = NULL; } break; // Primitive type: a null. // This must be a zero-length string. case tns_tag_null: check(len == 0, "Not a tnetstring: invalid null literal."); val = ops->get_null(ops); break; // Compound type: a dict. // The data is written case tns_tag_dict: val = ops->new_dict(ops); check(val != NULL, "Could not create dict."); check(tns_parse_dict(ops, val, data, len) != -1, "Not a tnetstring: broken dict items."); break; // Compound type: a list. // The data is written case tns_tag_list: val = ops->new_list(ops); check(val != NULL, "Could not create list."); check(tns_parse_list(ops, val, data, len) != -1, "Not a tnetstring: broken list items."); break; // Whoops, that ain't a tnetstring. default: sentinel("Not a tnetstring: invalid type tag."); } return val; error: if(val != NULL) { ops->free_value(ops, val); } return NULL; } #undef STR_EQ_TRUE #undef STR_EQ_FALSE char* tns_render(const tns_ops *ops, void *val, size_t *len) { tns_outbuf outbuf; check(tns_outbuf_init(&outbuf) != -1, "Failed to initialize outbuf."); check(tns_render_value(ops, val, &outbuf) != -1, "Failed to render value."); return tns_outbuf_finalize(&outbuf, len); error: tns_outbuf_free(&outbuf); return NULL; } int tns_render_value(const tns_ops *ops, void *val, tns_outbuf *outbuf) { tns_type_tag type = tns_tag_null; int res = -1; size_t orig_size = 0; assert(ops != NULL && "ops struct cannot be NULL"); // Find out the type tag for the given value. type = ops->get_type(ops, val); check(type != 0, "type not serializable."); tns_outbuf_putc(outbuf, type); orig_size = tns_outbuf_size(outbuf); // Render it into the output buffer using callbacks. switch(type) { case tns_tag_string: res = ops->render_string(ops, val, outbuf); break; case tns_tag_integer: res = ops->render_integer(ops, val, outbuf); break; case tns_tag_float: res = ops->render_float(ops, val, outbuf); break; case tns_tag_bool: res = ops->render_bool(ops, val, outbuf); break; case tns_tag_null: res = 0; break; case tns_tag_dict: res = ops->render_dict(ops, val, outbuf); break; case tns_tag_list: res = ops->render_list(ops, val, outbuf); break; default: sentinel("unknown type tag: '%c'.", type); } check(res == 0, "Failed to render value of type '%c'.", type); return tns_outbuf_clamp(outbuf, orig_size); error: return -1; } static int tns_parse_list(const tns_ops *ops, void *val, const char *data, size_t len) { void *item = NULL; char *remain = NULL; assert(val != NULL && "value cannot be NULL"); assert(data != NULL && "data cannot be NULL"); while(len > 0) { item = tns_parse(ops, data, len, &remain); check(item != NULL, "Failed to parse list."); len = len - (remain - data); data = remain; check(ops->add_to_list(ops, val, item) != -1, "Failed to add item to list."); item = NULL; } return 0; error: if(item) { ops->free_value(ops, item); } return -1; } static int tns_parse_dict(const tns_ops *ops, void *val, const char *data, size_t len) { void *key = NULL; void *item = NULL; char *remain = NULL; assert(val != NULL && "value cannot be NULL"); assert(data != NULL && "data cannot be NULL"); while(len > 0) { key = tns_parse(ops, data, len, &remain); check(key != NULL, "Failed to parse dict key from tnetstring."); len = len - (remain - data); data = remain; item = tns_parse(ops, data, len, &remain); check(item != NULL, "Failed to parse dict item from tnetstring."); len = len - (remain - data); data = remain; check(ops->add_to_dict(ops, val, key, item) != -1, "Failed to add element to dict."); key = NULL; item = NULL; } return 0; error: if(key) { ops->free_value(ops, key); } if(item) { ops->free_value(ops, item); } return -1; } static inline size_t tns_strtosz(const char *data, size_t len, size_t *sz, char **end) { char c; const char *pos, *eod; size_t value = 0; pos = data; eod = data + len; // The first character must be a digit. // The netstring spec explicitly forbits padding zeros. // So if it's a zero, it must be the only char in the string. c = *pos++; switch(c) { case '0': *sz = 0; *end = (char*) pos; return 0; case '1': case '2': case '3': case '4': case '5': case '6': case '7': case '8': case '9': value = c - '0'; break; default: return -1; } // Consume the remaining digits, up to maximum value length. while(pos < eod) { c = *pos; if(c < '0' || c > '9') { *sz = value; *end = (char*) pos; return 0; } value = (value * 10) + (c - '0'); check(value <= TNS_MAX_LENGTH, "Not a tnetstring: absurdly large length prefix"); pos++; } // If we consume the entire string, that's an error. error: return -1; } size_t tns_outbuf_size(tns_outbuf *outbuf) { return outbuf->alloc_size - (outbuf->head - outbuf->buffer); } static inline int tns_outbuf_itoa(tns_outbuf *outbuf, size_t n) { do { check(tns_outbuf_putc(outbuf, n%10+'0') != -1, "Failed to write int to tnetstring buffer."); n = n / 10; } while(n > 0); return 0; error: return -1; } int tns_outbuf_init(tns_outbuf *outbuf) { outbuf->buffer = malloc(64); check_mem(outbuf->buffer); outbuf->head = outbuf->buffer + 64; outbuf->alloc_size = 64; return 0; error: outbuf->head = NULL; outbuf->alloc_size = 0; return -1; } static inline void tns_outbuf_free(tns_outbuf *outbuf) { if(outbuf) { free(outbuf->buffer); outbuf->buffer = NULL; outbuf->head = 0; outbuf->alloc_size = 0; } } static inline int tns_outbuf_extend(tns_outbuf *outbuf, size_t free_size) { char *new_buf = NULL; char *new_head = NULL; size_t new_size = outbuf->alloc_size * 2; size_t used_size; used_size = tns_outbuf_size(outbuf); while(new_size < free_size + used_size) { new_size = new_size * 2; } new_buf = malloc(new_size); check_mem(new_buf); new_head = new_buf + new_size - used_size; memmove(new_head, outbuf->head, used_size); free(outbuf->buffer); outbuf->buffer = new_buf; outbuf->head = new_head; outbuf->alloc_size = new_size; return 0; error: return -1; } int tns_outbuf_putc(tns_outbuf *outbuf, char c) { if(outbuf->buffer == outbuf->head) { check(tns_outbuf_extend(outbuf, 1) != -1, "Failed to extend buffer"); } *(--outbuf->head) = c; return 0; error: return -1; } int tns_outbuf_puts(tns_outbuf *outbuf, const char *data, size_t len) { if(outbuf->head - outbuf->buffer < len) { check(tns_outbuf_extend(outbuf, len) != -1, "Failed to extend buffer"); } outbuf->head -= len; memmove(outbuf->head, data, len); return 0; error: return -1; } static char* tns_outbuf_finalize(tns_outbuf *outbuf, size_t *len) { char *new_buf = NULL; size_t used_size; used_size = tns_outbuf_size(outbuf); memmove(outbuf->buffer, outbuf->head, used_size); if(len != NULL) { *len = used_size; } else { if(outbuf->head == outbuf->buffer) { new_buf = realloc(outbuf->buffer, outbuf->alloc_size*2); check_mem(new_buf); outbuf->buffer = new_buf; outbuf->alloc_size = outbuf->alloc_size * 2; } outbuf->buffer[used_size] = '\0'; } return outbuf->buffer; error: free(outbuf->buffer); outbuf->buffer = NULL; outbuf->alloc_size = 0; return NULL; } static inline int tns_outbuf_clamp(tns_outbuf *outbuf, size_t orig_size) { size_t datalen = tns_outbuf_size(outbuf) - orig_size; check(tns_outbuf_putc(outbuf, ':') != -1, "Failed to clamp outbuf"); check(tns_outbuf_itoa(outbuf, datalen) != -1, "Failed to clamp outbuf"); return 0; error: return -1; } void tns_outbuf_memmove(tns_outbuf *outbuf, char *dest) { memmove(dest, outbuf->head, tns_outbuf_size(outbuf)); } tnetstring-0.2.1/tnetstring/_tnetstring.c0000644000175000017500000005032011763274273020222 0ustar rfkrfk00000000000000// // _tnetstring.c: python module for fast encode/decode of typed-netstrings // // You get the following functions: // // dumps: dump a python object to a tnetstring // loads: parse tnetstring into a python object // load: parse tnetstring from a file-like object // pop: parse tnetstring into a python object, // return it along with unparsed data. #include #define TNS_MAX_LENGTH 999999999 #include "tns_core.c" // We have one static tns_ops struct for parsing bytestrings. static tns_ops _tnetstring_ops_bytes; // Unicode parsing ops are created on demand. // We allocate a struct containing all the function pointers along with // the encoding string, as a primitive kind of closure. // Eventually we should cache these. struct tns_ops_with_encoding_s { tns_ops ops; char *encoding; }; typedef struct tns_ops_with_encoding_s tns_ops_with_encoding; static tns_ops *_tnetstring_get_unicode_ops(PyObject *encoding); // _tnetstring_loads: parse tnetstring-format value from a string. // static PyObject* _tnetstring_loads(PyObject* self, PyObject *args) { PyObject *string = NULL; PyObject *encoding = Py_None; PyObject *val = NULL; tns_ops *ops = &_tnetstring_ops_bytes; char *data; size_t len; if(!PyArg_UnpackTuple(args, "loads", 1, 2, &string, &encoding)) { return NULL; } if(!PyString_Check(string)) { PyErr_SetString(PyExc_TypeError, "arg must be a string"); return NULL; } Py_INCREF(string); if(encoding == Py_None) { data = PyString_AS_STRING(string); len = PyString_GET_SIZE(string); val = tns_parse(ops, data, len, NULL); } else { if(!PyString_Check(encoding)) { PyErr_SetString(PyExc_TypeError, "encoding must be a string"); goto error; } Py_INCREF(encoding); ops = _tnetstring_get_unicode_ops(encoding); if(ops == NULL) { Py_DECREF(encoding); goto error; } data = PyString_AS_STRING(string); len = PyString_GET_SIZE(string); val = tns_parse(ops, data, len, NULL); free(ops); Py_DECREF(encoding); } Py_DECREF(string); return val; error: Py_DECREF(string); return NULL; } // _tnetstring_load: parse tnetstring-format value from a file. // // This takes care to read no more data than is required to get the // full tnetstring-encoded value. It might read arbitrarily-much // data if the file doesn't begin with a valid tnetstring. // static PyObject* _tnetstring_load(PyObject* self, PyObject *args) { PyObject *val = NULL; PyObject *file = NULL; PyObject *encoding = Py_None; PyObject *methnm = NULL; PyObject *metharg = NULL; PyObject *res = NULL; tns_ops *ops = &_tnetstring_ops_bytes; char c, *data; size_t datalen = 0; if(!PyArg_UnpackTuple(args, "load", 1, 2, &file, &encoding)) { goto error; } Py_INCREF(file); if(encoding != Py_None) { if(!PyString_Check(encoding)) { PyErr_SetString(PyExc_TypeError, "encoding must be a string"); goto error; } Py_INCREF(encoding); ops = _tnetstring_get_unicode_ops(encoding); if(ops == NULL) { goto error; } } // We're going to read one char at a time if((methnm = PyString_FromString("read")) == NULL) { goto error; } if((metharg = PyInt_FromLong(1)) == NULL) { goto error; } // Read the length prefix one char at a time res = PyObject_CallMethodObjArgs(file, methnm, metharg, NULL); if(res == NULL) { goto error; } Py_INCREF(res); if(!PyString_Check(res) || !PyString_GET_SIZE(res)) { PyErr_SetString(PyExc_ValueError, "Not a tnetstring: invalid or missing length prefix"); goto error; } c = PyString_AS_STRING(res)[0]; Py_DECREF(res); res = NULL; // Note that the netstring spec explicitly forbids padding zeroes. // If the first char is zero, it must be the only char. if(c < '0' || c > '9') { PyErr_SetString(PyExc_ValueError, "Not a tnetstring: invalid or missing length prefix"); goto error; } else if (c == '0') { res = PyObject_CallMethodObjArgs(file, methnm, metharg, NULL); if(res == NULL) { goto error; } Py_INCREF(res); if(!PyString_Check(res) || !PyString_GET_SIZE(res)) { PyErr_SetString(PyExc_ValueError, "Not a tnetstring: invalid or missing length prefix"); goto error; } c = PyString_AS_STRING(res)[0]; Py_DECREF(res); res = NULL; } else { do { datalen = (10 * datalen) + (c - '0'); check(datalen <= TNS_MAX_LENGTH, "Not a tnetstring: absurdly large length prefix"); res = PyObject_CallMethodObjArgs(file, methnm, metharg, NULL); if(res == NULL) { goto error; } Py_INCREF(res); if(!PyString_Check(res) || !PyString_GET_SIZE(res)) { PyErr_SetString(PyExc_ValueError, "Not a tnetstring: invalid or missing length prefix"); goto error; } c = PyString_AS_STRING(res)[0]; Py_DECREF(res); res = NULL; } while(c >= '0' && c <= '9'); } // Validate end-of-length-prefix marker. if(c != ':') { PyErr_SetString(PyExc_ValueError, "Not a tnetstring: missing length prefix"); goto error; } // Read the data plus terminating type tag. Py_DECREF(metharg); if((metharg = PyInt_FromSize_t(datalen + 1)) == NULL) { goto error; } res = PyObject_CallMethodObjArgs(file, methnm, metharg, NULL); if(res == NULL) { goto error; } Py_INCREF(res); Py_DECREF(file); file = NULL; Py_DECREF(methnm); methnm = NULL; Py_DECREF(metharg); metharg = NULL; if(!PyString_Check(res) || PyString_GET_SIZE(res) != datalen + 1) { PyErr_SetString(PyExc_ValueError, "Not a tnetstring: invalid length prefix"); goto error; } // Parse out the payload object data = PyString_AS_STRING(res); val = tns_parse_payload(ops, data[datalen], data, datalen); Py_DECREF(res); res = NULL; if(ops != &_tnetstring_ops_bytes) { free(ops); Py_DECREF(encoding); } return val; error: if(file != NULL) { Py_DECREF(file); } if(ops != &_tnetstring_ops_bytes) { free(ops); Py_DECREF(encoding); } if(methnm != NULL) { Py_DECREF(methnm); } if(metharg != NULL) { Py_DECREF(metharg); } if(res != NULL) { Py_DECREF(res); } if(val != NULL) { Py_DECREF(val); } return NULL; } static PyObject* _tnetstring_pop(PyObject* self, PyObject *args) { PyObject *string = NULL; PyObject *val = NULL; PyObject *rest = NULL; PyObject *result = NULL; PyObject *encoding = Py_None; tns_ops *ops = &_tnetstring_ops_bytes; char *data, *remain; size_t len; if(!PyArg_UnpackTuple(args, "pop", 1, 2, &string, &encoding)) { return NULL; } if(!PyString_Check(string)) { PyErr_SetString(PyExc_TypeError, "arg must be a string"); return NULL; } if(encoding != Py_None) { if(!PyString_Check(encoding)) { PyErr_SetString(PyExc_TypeError, "encoding must be a string"); return NULL; } Py_INCREF(encoding); ops = _tnetstring_get_unicode_ops(encoding); if(ops == NULL) { Py_DECREF(encoding); return NULL; } } Py_INCREF(string); data = PyString_AS_STRING(string); len = PyString_GET_SIZE(string); val = tns_parse(ops, data, len, &remain); Py_DECREF(string); if(ops != &_tnetstring_ops_bytes) { free(ops); Py_DECREF(encoding); } if(val == NULL) { return NULL; } rest = PyString_FromStringAndSize(remain, len-(remain-data)); if(rest == NULL) { result = NULL; } else { result = PyTuple_Pack(2, val, rest); Py_DECREF(rest); } Py_DECREF(val); return result; } static PyObject* _tnetstring_dumps(PyObject* self, PyObject *args) { PyObject *object = NULL; PyObject *string = NULL; PyObject *encoding = Py_None; tns_ops *ops = &_tnetstring_ops_bytes; tns_outbuf outbuf; if(!PyArg_UnpackTuple(args, "dumps", 1, 2, &object, &encoding)) { return NULL; } if(encoding != Py_None) { if(!PyString_Check(encoding)) { PyErr_SetString(PyExc_TypeError, "encoding must be a string"); return NULL; } Py_INCREF(encoding); ops = _tnetstring_get_unicode_ops(encoding); if(ops == NULL) { Py_DECREF(encoding); return NULL; } } Py_INCREF(object); if(tns_outbuf_init(&outbuf) == -1) { goto error; } if(tns_render_value(ops, object, &outbuf) == -1) { goto error; } Py_DECREF(object); string = PyString_FromStringAndSize(NULL,tns_outbuf_size(&outbuf)); if(string == NULL) { goto error; } tns_outbuf_memmove(&outbuf, PyString_AS_STRING(string)); free(outbuf.buffer); if(ops != &_tnetstring_ops_bytes) { free(ops); Py_DECREF(encoding); } return string; error: if(ops != &_tnetstring_ops_bytes) { free(ops); Py_DECREF(encoding); } Py_DECREF(object); return NULL; } static PyMethodDef _tnetstring_methods[] = { {"load", (PyCFunction)_tnetstring_load, METH_VARARGS, PyDoc_STR("load(file,encoding=None) -> object\n" "This function reads a tnetstring from a file and parses it\n" " into a python object.")}, {"loads", (PyCFunction)_tnetstring_loads, METH_VARARGS, PyDoc_STR("loads(string,encoding=None) -> object\n" "This function parses a tnetstring into a python object.")}, {"pop", (PyCFunction)_tnetstring_pop, METH_VARARGS, PyDoc_STR("pop(string,encoding=None) -> (object, remain)\n" "This function parses a tnetstring into a python object.\n" "It returns a tuple giving the parsed object and a string\n" "containing any unparsed data.")}, {"dumps", (PyCFunction)_tnetstring_dumps, METH_VARARGS, PyDoc_STR("dumps(object,encoding=None) -> string\n" "This function dumps a python object as a tnetstring.")}, {NULL, NULL} }; // Functions to hook the parser core up to python. static void* tns_parse_string(const tns_ops *ops, const char *data, size_t len) { return PyString_FromStringAndSize(data, len); } static void* tns_parse_unicode(const tns_ops *ops, const char *data, size_t len) { char* encoding = ((tns_ops_with_encoding*)ops)->encoding; return PyUnicode_Decode(data, len, encoding, NULL); } static void* tns_parse_integer(const tns_ops *ops, const char *data, size_t len) { long l = 0; long long ll = 0; int sign = 1; char c; char *dataend; const char *pos, *eod; PyObject *v = NULL; // Anything with less than 10 digits, we can fit into a long. // Hand-parsing, as we need tighter error-checking than strtol. if (len < 10) { pos = data; eod = data + len; c = *pos++; switch(c) { case '0': case '1': case '2': case '3': case '4': case '5': case '6': case '7': case '8': case '9': l = c - '0'; break; case '+': break; case '-': sign = -1; break; default: sentinel("invalid integer literal"); } while(pos < eod) { c = *pos++; check(c >= '0' && c <= '9', "invalid integer literal"); l = (l * 10) + (c - '0'); } return PyLong_FromLong(l * sign); } // Anything with less than 19 digits fits in a long long. // Hand-parsing, as we need tighter error-checking than strtoll. else if(len < 19) { pos = data; eod = data + len; c = *pos++; switch(c) { case '0': case '1': case '2': case '3': case '4': case '5': case '6': case '7': case '8': case '9': ll = c - '0'; break; case '+': break; case '-': sign = -1; break; default: sentinel("invalid integer literal"); } while(pos < eod) { c = *pos++; check(c >= '0' && c <= '9', "invalid integer literal"); ll = (ll * 10) + (c - '0'); } return PyLong_FromLongLong(ll * sign); } // Really big numbers are passed to python's native parser. else { // PyLong_FromString allows leading whitespace, so we have to check // that there is none present in the string. c = *data; switch(c) { case '0': case '1': case '2': case '3': case '4': case '5': case '6': case '7': case '8': case '9': break; case '+': case '-': c = *(data+1); check(c >= '0' && c <= '9', "invalid integer literal"); break; default: sentinel("invalid integer literal"); } // PyLong_FromString insists that the string end in a NULL byte. // I am *not* copying all that data. Instead we lie a little bit // about the const-ness of data, write a NULL over the format terminator // and restore the original character when we're done. c = data[len]; ((char*)data)[len] = '\0'; v = PyLong_FromString((char *)data, &dataend, 10); ((char*)data)[len] = c; check(dataend == data + len, "invalid integer literal"); return v; } sentinel("invalid code branch, check your compiler..."); error: return NULL; } static void* tns_parse_float(const tns_ops *ops, const char *data, size_t len) { double d = 0; char *dataend; // Technically this allows whitespace around the float, which // isn't valid in a tnetstring. But I don't want to waste the // time checking and I am *not* reimplementing strtod. d = strtod(data, &dataend); if(dataend != data + len) { return NULL; } return PyFloat_FromDouble(d); } static void* tns_get_null(const tns_ops *ops) { Py_INCREF(Py_None); return Py_None; } static void* tns_get_true(const tns_ops *ops) { Py_INCREF(Py_True); return Py_True; } static void* tns_get_false(const tns_ops *ops) { Py_INCREF(Py_False); return Py_False; } static void* tns_new_dict(const tns_ops *ops) { return PyDict_New(); } static void* tns_new_list(const tns_ops *ops) { return PyList_New(0); } static void tns_free_value(const tns_ops *ops, void *value) { Py_XDECREF(value); } static int tns_add_to_dict(const tns_ops *ops, void *dict, void *key, void *item) { int res; res = PyDict_SetItem(dict, key, item); Py_DECREF(key); Py_DECREF(item); if(res == -1) { return -1; } return 0; } static int tns_add_to_list(const tns_ops *ops, void *list, void *item) { int res; res = PyList_Append(list, item); Py_DECREF(item); if(res == -1) { return -1; } return 0; } static int tns_render_string(const tns_ops *ops, void *val, tns_outbuf *outbuf) { return tns_outbuf_puts(outbuf, PyString_AS_STRING(val), PyString_GET_SIZE(val)); } static int tns_render_unicode(const tns_ops *ops, void *val, tns_outbuf *outbuf) { PyObject *bytes; char* encoding = ((tns_ops_with_encoding*)ops)->encoding; if(PyUnicode_Check(val)) { bytes = PyUnicode_Encode(PyUnicode_AS_UNICODE(val), PyUnicode_GET_SIZE(val), encoding, NULL); if(bytes == NULL) { return -1; } if(tns_render_string(ops, bytes, outbuf) == -1) { return -1; } Py_DECREF(bytes); return 0; } if(PyString_Check(val)) { return tns_render_string(ops, val, outbuf); } return -1; } static int tns_render_integer(const tns_ops *ops, void *val, tns_outbuf *outbuf) { PyObject *string = NULL; int res = 0; string = PyObject_Str(val); if(string == NULL) { return -1; } res = tns_render_string(ops, string, outbuf); Py_DECREF(string); return res; } static int tns_render_float(const tns_ops *ops, void *val, tns_outbuf *outbuf) { PyObject *string; int res = 0; string = PyObject_Repr(val); if(string == NULL) { return -1; } res = tns_render_string(ops, string, outbuf); Py_DECREF(string); return res; } static int tns_render_bool(const tns_ops *ops, void *val, tns_outbuf *outbuf) { if(val == Py_True) { return tns_outbuf_puts(outbuf, "true", 4); } else { return tns_outbuf_puts(outbuf, "false", 5); } } static int tns_render_dict(const tns_ops *ops, void *val, tns_outbuf *outbuf) { PyObject *key, *item; Py_ssize_t pos = 0; while(PyDict_Next(val, &pos, &key, &item)) { if(tns_render_value(ops, item, outbuf) == -1) { return -1; } if(tns_render_value(ops, key, outbuf) == -1) { return -1; } } return 0; } static int tns_render_list(const tns_ops *ops, void *val, tns_outbuf *outbuf) { PyObject *item; Py_ssize_t idx; // Remember, all output is in reverse. // So we must write the last element first. idx = PyList_GET_SIZE(val) - 1; while(idx >= 0) { item = PyList_GET_ITEM(val, idx); if(tns_render_value(ops, item, outbuf) == -1) { return -1; } idx--; } return 0; } static tns_type_tag tns_get_type(const tns_ops *ops, void *val) { if(val == Py_True || val == Py_False) { return tns_tag_bool; } if(val == Py_None) { return tns_tag_null; } if(PyInt_Check((PyObject*)val) || PyLong_Check((PyObject*)val)) { return tns_tag_integer; } if(PyFloat_Check((PyObject*)val)) { return tns_tag_float; } if(PyString_Check((PyObject*)val)) { return tns_tag_string; } if(PyList_Check((PyObject*)val)) { return tns_tag_list; } if(PyDict_Check((PyObject*)val)) { return tns_tag_dict; } return 0; } static tns_type_tag tns_get_type_unicode(const tns_ops *ops, void *val) { tns_type_tag type = 0; type = tns_get_type(ops, val); if(type == 0) { if(PyUnicode_Check(val)) { type = tns_tag_string; } } return type; } static tns_ops *_tnetstring_get_unicode_ops(PyObject *encoding) { tns_ops_with_encoding *opswe = NULL; tns_ops *ops = NULL; opswe = malloc(sizeof(tns_ops_with_encoding)); if(opswe == NULL) { PyErr_SetString(PyExc_MemoryError, "could not allocate ops struct"); return NULL; } ops = (tns_ops*)opswe; opswe->encoding = PyString_AS_STRING(encoding); ops->get_type = &tns_get_type_unicode; ops->free_value = &tns_free_value; ops->parse_string = tns_parse_unicode; ops->parse_integer = tns_parse_integer; ops->parse_float = tns_parse_float; ops->get_null = tns_get_null; ops->get_true = tns_get_true; ops->get_false = tns_get_false; ops->render_string = tns_render_unicode; ops->render_integer = tns_render_integer; ops->render_float = tns_render_float; ops->render_bool = tns_render_bool; ops->new_dict = tns_new_dict; ops->add_to_dict = tns_add_to_dict; ops->render_dict = tns_render_dict; ops->new_list = tns_new_list; ops->add_to_list = tns_add_to_list; ops->render_list = tns_render_list; return ops; } PyDoc_STRVAR(module_doc, "Fast encoding/decoding of typed-netstrings." ); PyMODINIT_FUNC init_tnetstring(void) { Py_InitModule3("_tnetstring", _tnetstring_methods, module_doc); // Initialize function pointers for parsing bytes. _tnetstring_ops_bytes.get_type = &tns_get_type; _tnetstring_ops_bytes.free_value = &tns_free_value; _tnetstring_ops_bytes.parse_string = tns_parse_string; _tnetstring_ops_bytes.parse_integer = tns_parse_integer; _tnetstring_ops_bytes.parse_float = tns_parse_float; _tnetstring_ops_bytes.get_null = tns_get_null; _tnetstring_ops_bytes.get_true = tns_get_true; _tnetstring_ops_bytes.get_false = tns_get_false; _tnetstring_ops_bytes.render_string = tns_render_string; _tnetstring_ops_bytes.render_integer = tns_render_integer; _tnetstring_ops_bytes.render_float = tns_render_float; _tnetstring_ops_bytes.render_bool = tns_render_bool; _tnetstring_ops_bytes.new_dict = tns_new_dict; _tnetstring_ops_bytes.add_to_dict = tns_add_to_dict; _tnetstring_ops_bytes.render_dict = tns_render_dict; _tnetstring_ops_bytes.new_list = tns_new_list; _tnetstring_ops_bytes.add_to_list = tns_add_to_list; _tnetstring_ops_bytes.render_list = tns_render_list; } tnetstring-0.2.1/tnetstring/dbg.h0000644000175000017500000000116611550315673016421 0ustar rfkrfk00000000000000// // dbg.h: minimal checking and debugging functions // // This is a small compatability shim for the Mongrel2 "dbg.h" interface, // to make it easier to port code back and forth between the tnetstring // implementation in Mongrel2 and this module. // #ifndef __dbg_h__ #define __dbg_h__ #define check(A, M, ...) if(!(A)) { if(PyErr_Occurred() == NULL) { PyErr_Format(PyExc_ValueError, M, ##__VA_ARGS__); }; goto error; } #define sentinel(M, ...) check(0, M, ##__VA_ARGS__) #define check_mem(A) if(A==NULL) { if(PyErr_Occurred() == NULL) { PyErr_SetString(PyExc_MemoryError, "Out of memory."); }; goto error; } #endif tnetstring-0.2.1/tnetstring/tests/0000755000175000017500000000000011763275626016663 5ustar rfkrfk00000000000000tnetstring-0.2.1/tnetstring/tests/test_misc.py0000644000175000017500000000156511542240466021223 0ustar rfkrfk00000000000000 import os import os.path import difflib import unittest import doctest import tnetstring class Test_Misc(unittest.TestCase): def test_readme_matches_docstring(self): """Ensure that the README is in sync with the docstring. This test should always pass; if the README is out of sync it just updates it with the contents of tnetstring.__doc__. """ dirname = os.path.dirname readme = os.path.join(dirname(dirname(dirname(__file__))),"README.rst") if not os.path.isfile(readme): f = open(readme,"wb") f.write(tnetstring.__doc__.encode()) f.close() else: f = open(readme,"rb") if f.read() != tnetstring.__doc__: f.close() f = open(readme,"wb") f.write(tnetstring.__doc__.encode()) f.close() tnetstring-0.2.1/tnetstring/tests/test_format.py0000644000175000017500000001222111763274254021556 0ustar rfkrfk00000000000000 import sys import unittest import random import math import StringIO import tnetstring FORMAT_EXAMPLES = { '0:}': {}, '0:]': [], '51:5:hello,39:11:12345678901#4:this,4:true!0:~4:\x00\x00\x00\x00,]}': {'hello': [12345678901, 'this', True, None, '\x00\x00\x00\x00']}, '5:12345#': 12345, '12:this is cool,': "this is cool", '0:,': "", '0:~': None, '4:true!': True, '5:false!': False, '10:\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00,': "\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00", '24:5:12345#5:67890#5:xxxxx,]': [12345, 67890, 'xxxxx'], '18:3:0.1^3:0.2^3:0.3^]': [0.1, 0.2, 0.3], '243:238:233:228:223:218:213:208:203:198:193:188:183:178:173:168:163:158:153:148:143:138:133:128:123:118:113:108:103:99:95:91:87:83:79:75:71:67:63:59:55:51:47:43:39:35:31:27:23:19:15:11:hello-there,]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]': [[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[["hello-there"]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]] } def get_random_object(random=random,depth=0,unicode=False): """Generate a random serializable object.""" # The probability of generating a scalar value increases as the depth increase. # This ensures that we bottom out eventually. if random.randint(depth,10) <= 4: what = random.randint(0,1) if what == 0: n = random.randint(0,10) l = [] for _ in xrange(n): l.append(get_random_object(random,depth+1,unicode)) return l if what == 1: n = random.randint(0,10) d = {} for _ in xrange(n): n = random.randint(0,100) k = "".join(chr(random.randint(32,126)) for _ in xrange(n)) if unicode: k = k.decode("ascii") d[k] = get_random_object(random,depth+1,unicode) return d else: what = random.randint(0,4) if what == 0: return None if what == 1: return True if what == 2: return False if what == 3: if random.randint(0,1) == 0: return random.randint(0,sys.maxint) else: return -1 * random.randint(0,sys.maxint) n = random.randint(0,100) if unicode: return u"".join(chr(random.randint(32,126)) for _ in xrange(n)) class Test_Format(unittest.TestCase): def test_roundtrip_format_examples(self): for data, expect in FORMAT_EXAMPLES.items(): self.assertEqual(expect,tnetstring.loads(data)) self.assertEqual(expect,tnetstring.loads(tnetstring.dumps(expect))) self.assertEqual((expect,""),tnetstring.pop(data)) def test_roundtrip_format_random(self): for _ in xrange(500): v = get_random_object() self.assertEqual(v,tnetstring.loads(tnetstring.dumps(v))) self.assertEqual((v,""),tnetstring.pop(tnetstring.dumps(v))) def test_unicode_handling(self): self.assertRaises(ValueError,tnetstring.dumps,u"hello") self.assertEquals(tnetstring.dumps(u"hello","utf8"),"5:hello,") self.assertEquals(type(tnetstring.loads("5:hello,")),str) self.assertEquals(type(tnetstring.loads("5:hello,","utf8")),unicode) ALPHA = u"\N{GREEK CAPITAL LETTER ALPHA}lpha" self.assertEquals(tnetstring.dumps(ALPHA,"utf8"),"6:"+ALPHA.encode("utf8")+",") self.assertEquals(tnetstring.dumps(ALPHA,"utf16"),"12:"+ALPHA.encode("utf16")+",") self.assertEquals(tnetstring.loads("12:\xff\xfe\x91\x03l\x00p\x00h\x00a\x00,","utf16"),ALPHA) def test_roundtrip_format_unicode(self): for _ in xrange(500): v = get_random_object(unicode=True) self.assertEqual(v,tnetstring.loads(tnetstring.dumps(v,"utf8"),"utf8")) self.assertEqual((v,""),tnetstring.pop(tnetstring.dumps(v,"utf16"),"utf16")) def test_roundtrip_big_integer(self): i1 = math.factorial(30000) s = tnetstring.dumps(i1) i2 = tnetstring.loads(s) self.assertEquals(i1, i2) class Test_FileLoading(unittest.TestCase): def test_roundtrip_file_examples(self): for data, expect in FORMAT_EXAMPLES.items(): s = StringIO.StringIO() s.write(data) s.write("OK") s.seek(0) self.assertEqual(expect,tnetstring.load(s)) self.assertEqual("OK",s.read()) s = StringIO.StringIO() tnetstring.dump(expect,s) s.write("OK") s.seek(0) self.assertEqual(expect,tnetstring.load(s)) self.assertEqual("OK",s.read()) def test_roundtrip_file_random(self): for _ in xrange(500): v = get_random_object() s = StringIO.StringIO() tnetstring.dump(v,s) s.write("OK") s.seek(0) self.assertEqual(v,tnetstring.load(s)) self.assertEqual("OK",s.read()) def test_error_on_absurd_lengths(self): s = StringIO.StringIO() s.write("1000000000:pwned!,") s.seek(0) self.assertRaises(ValueError,tnetstring.load,s) self.assertEquals(s.read(1),":") tnetstring-0.2.1/tnetstring/tests/__init__.py0000644000175000017500000000000011542235742020750 0ustar rfkrfk00000000000000tnetstring-0.2.1/tnetstring/__init__.py0000644000175000017500000003073411763274120017625 0ustar rfkrfk00000000000000""" tnetstring: data serialization using typed netstrings ====================================================== This is a data serialization library. It's a lot like JSON but it uses a new syntax called "typed netstrings" that Zed has proposed for use in the Mongrel2 webserver. It's designed to be simpler and easier to implement than JSON, with a happy consequence of also being faster in many cases. An ordinary netstring is a blob of data prefixed with its length and postfixed with a sanity-checking comma. The string "hello world" encodes like this:: 11:hello world, Typed netstrings add other datatypes by replacing the comma with a type tag. Here's the integer 12345 encoded as a tnetstring:: 5:12345# And here's the list [12345,True,0] which mixes integers and bools:: 19:5:12345#4:true!1:0#] Simple enough? This module gives you the following functions: :dump: dump an object as a tnetstring to a file :dumps: dump an object as a tnetstring to a string :load: load a tnetstring-encoded object from a file :loads: load a tnetstring-encoded object from a string :pop: pop a tnetstring-encoded object from the front of a string Note that since parsing a tnetstring requires reading all the data into memory at once, there's no efficiency gain from using the file-based versions of these functions. They're only here so you can use load() to read precisely one item from a file or socket without consuming any extra data. The tnetstrings specification explicitly states that strings are binary blobs and forbids the use of unicode at the protocol level. As a convenience to python programmers, this library lets you specify an application-level encoding to translate python's unicode strings to and from binary blobs: >>> print repr(tnetstring.loads("2:\\xce\\xb1,")) '\\xce\\xb1' >>> >>> print repr(tnetstring.loads("2:\\xce\\xb1,", "utf8")) u'\\u03b1' """ __ver_major__ = 0 __ver_minor__ = 2 __ver_patch__ = 1 __ver_sub__ = "" __version__ = "%d.%d.%d%s" % (__ver_major__,__ver_minor__,__ver_patch__,__ver_sub__) from collections import deque def dumps(value,encoding=None): """dumps(object,encoding=None) -> string This function dumps a python object as a tnetstring. """ # This uses a deque to collect output fragments in reverse order, # then joins them together at the end. It's measurably faster # than creating all the intermediate strings. # If you're reading this to get a handle on the tnetstring format, # consider the _gdumps() function instead; it's a standard top-down # generator that's simpler to understand but much less efficient. q = deque() _rdumpq(q,0,value,encoding) return "".join(q) def dump(value,file,encoding=None): """dump(object,file,encoding=None) This function dumps a python object as a tnetstring and writes it to the given file. """ file.write(dumps(value,encoding)) def _rdumpq(q,size,value,encoding=None): """Dump value as a tnetstring, to a deque instance, last chunks first. This function generates the tnetstring representation of the given value, pushing chunks of the output onto the given deque instance. It pushes the last chunk first, then recursively generates more chunks. When passed in the current size of the string in the queue, it will return the new size of the string in the queue. Operating last-chunk-first makes it easy to calculate the size written for recursive structures without having to build their representation as a string. This is measurably faster than generating the intermediate strings, especially on deeply nested structures. """ write = q.appendleft if value is None: write("0:~") return size + 3 if value is True: write("4:true!") return size + 7 if value is False: write("5:false!") return size + 8 if isinstance(value,(int,long)): data = str(value) ldata = len(data) span = str(ldata) write("#") write(data) write(":") write(span) return size + 2 + len(span) + ldata if isinstance(value,(float,)): # Use repr() for float rather than str(). # It round-trips more accurately. # Probably unnecessary in later python versions that # use David Gay's ftoa routines. data = repr(value) ldata = len(data) span = str(ldata) write("^") write(data) write(":") write(span) return size + 2 + len(span) + ldata if isinstance(value,str): lvalue = len(value) span = str(lvalue) write(",") write(value) write(":") write(span) return size + 2 + len(span) + lvalue if isinstance(value,(list,tuple,)): write("]") init_size = size = size + 1 for item in reversed(value): size = _rdumpq(q,size,item,encoding) span = str(size - init_size) write(":") write(span) return size + 1 + len(span) if isinstance(value,dict): write("}") init_size = size = size + 1 for (k,v) in value.iteritems(): size = _rdumpq(q,size,v,encoding) size = _rdumpq(q,size,k,encoding) span = str(size - init_size) write(":") write(span) return size + 1 + len(span) if isinstance(value,unicode): if encoding is None: raise ValueError("must specify encoding to dump unicode strings") value = value.encode(encoding) lvalue = len(value) span = str(lvalue) write(",") write(value) write(":") write(span) return size + 2 + len(span) + lvalue raise ValueError("unserializable object") def _gdumps(value,encoding): """Generate fragments of value dumped as a tnetstring. This is the naive dumping algorithm, implemented as a generator so that it's easy to pass to "".join() without building a new list. This is mainly here for comparison purposes; the _rdumpq version is measurably faster as it doesn't have to build intermediate strins. """ if value is None: yield "0:~" elif value is True: yield "4:true!" elif value is False: yield "5:false!" elif isinstance(value,(int,long)): data = str(value) yield str(len(data)) yield ":" yield data yield "#" elif isinstance(value,(float,)): data = repr(value) yield str(len(data)) yield ":" yield data yield "^" elif isinstance(value,(str,)): yield str(len(value)) yield ":" yield value yield "," elif isinstance(value,(list,tuple,)): sub = [] for item in value: sub.extend(_gdumps(item)) sub = "".join(sub) yield str(len(sub)) yield ":" yield sub yield "]" elif isinstance(value,(dict,)): sub = [] for (k,v) in value.iteritems(): sub.extend(_gdumps(k)) sub.extend(_gdumps(v)) sub = "".join(sub) yield str(len(sub)) yield ":" yield sub yield "}" elif isinstance(value,(unicode,)): if encoding is None: raise ValueError("must specify encoding to dump unicode strings") value = value.encode(encoding) yield str(len(value)) yield ":" yield value yield "," else: raise ValueError("unserializable object") def loads(string,encoding=None): """loads(string,encoding=None) -> object This function parses a tnetstring into a python object. """ # No point duplicating effort here. In the C-extension version, # loads() is measurably faster then pop() since it can avoid # the overhead of building a second string. return pop(string,encoding)[0] def load(file,encoding=None): """load(file,encoding=None) -> object This function reads a tnetstring from a file and parses it into a python object. The file must support the read() method, and this function promises not to read more data than necessary. """ # Read the length prefix one char at a time. # Note that the netstring spec explicitly forbids padding zeros. c = file.read(1) if not c.isdigit(): raise ValueError("not a tnetstring: missing or invalid length prefix") datalen = ord(c) - ord("0") c = file.read(1) if datalen != 0: while c.isdigit(): datalen = (10 * datalen) + (ord(c) - ord("0")) if datalen > 999999999: errmsg = "not a tnetstring: absurdly large length prefix" raise ValueError(errmsg) c = file.read(1) if c != ":": raise ValueError("not a tnetstring: missing or invalid length prefix") # Now we can read and parse the payload. # This repeats the dispatch logic of pop() so we can avoid # re-constructing the outermost tnetstring. data = file.read(datalen) if len(data) != datalen: raise ValueError("not a tnetstring: length prefix too big") type = file.read(1) if type == ",": if encoding is not None: return data.decode(encoding) return data if type == "#": try: return int(data) except ValueError: raise ValueError("not a tnetstring: invalid integer literal") if type == "^": try: return float(data) except ValueError: raise ValueError("not a tnetstring: invalid float literal") if type == "!": if data == "true": return True elif data == "false": return False else: raise ValueError("not a tnetstring: invalid boolean literal") if type == "~": if data: raise ValueError("not a tnetstring: invalid null literal") return None if type == "]": l = [] while data: (item,data) = pop(data,encoding) l.append(item) return l if type == "}": d = {} while data: (key,data) = pop(data,encoding) (val,data) = pop(data,encoding) d[key] = val return d raise ValueError("unknown type tag") def pop(string,encoding=None): """pop(string,encoding=None) -> (object, remain) This function parses a tnetstring into a python object. It returns a tuple giving the parsed object and a string containing any unparsed data from the end of the string. """ # Parse out data length, type and remaining string. try: (dlen,rest) = string.split(":",1) dlen = int(dlen) except ValueError: raise ValueError("not a tnetstring: missing or invalid length prefix") try: (data,type,remain) = (rest[:dlen],rest[dlen],rest[dlen+1:]) except IndexError: # This fires if len(rest) < dlen, meaning we don't need # to further validate that data is the right length. raise ValueError("not a tnetstring: invalid length prefix") # Parse the data based on the type tag. if type == ",": if encoding is not None: return (data.decode(encoding),remain) return (data,remain) if type == "#": try: return (int(data),remain) except ValueError: raise ValueError("not a tnetstring: invalid integer literal") if type == "^": try: return (float(data),remain) except ValueError: raise ValueError("not a tnetstring: invalid float literal") if type == "!": if data == "true": return (True,remain) elif data == "false": return (False,remain) else: raise ValueError("not a tnetstring: invalid boolean literal") if type == "~": if data: raise ValueError("not a tnetstring: invalid null literal") return (None,remain) if type == "]": l = [] while data: (item,data) = pop(data,encoding) l.append(item) return (l,remain) if type == "}": d = {} while data: (key,data) = pop(data,encoding) (val,data) = pop(data,encoding) d[key] = val return (d,remain) raise ValueError("unknown type tag") # Use the c-extension version if available try: import _tnetstring except ImportError: pass else: dumps = _tnetstring.dumps load = _tnetstring.load loads = _tnetstring.loads pop = _tnetstring.pop tnetstring-0.2.1/tnetstring/tns_core.h0000644000175000017500000001170611763274443017507 0ustar rfkrfk00000000000000// // tns_core.h: core code for a tnetstring parser in C // // This is code for parsing and rendering data in the provisional // typed-netstring format proposed for inclusion in Mongrel2. You can // think of it like a JSON library that uses a simpler wire format. // #ifndef _tns_core_h #define _tns_core_h #include #include #include // tnetstring rendering is done using an "outbuf" struct, which combines // a malloced string with its allocation information. Rendering is done // from back to front; the details are deliberately hidden here since // I'm experimenting with multiple implementations and it might change. struct tns_outbuf_s; typedef struct tns_outbuf_s tns_outbuf; // This enumeration gives the type tag for each data type in the // tnetstring encoding. typedef enum tns_type_tag_e { tns_tag_string = ',', tns_tag_integer = '#', tns_tag_float = '^', tns_tag_bool = '!', tns_tag_null = '~', tns_tag_dict = '}', tns_tag_list = ']', } tns_type_tag; // To convert between tnetstrings and the data structures of your application // you provide the following struct filled with function pointers. They // will be called by the core parser/renderer as necessary. // // Each callback is called with the containing struct as its first argument, // to allow a primitive type of closure. struct tns_ops_s; typedef struct tns_ops_s tns_ops; struct tns_ops_s { // Get the type of a data object. tns_type_tag (*get_type)(const tns_ops *ops, void *val); // Parse various types of object from a string. void* (*parse_string)(const tns_ops *ops, const char *data, size_t len); void* (*parse_integer)(const tns_ops *ops, const char *data, size_t len); void* (*parse_float)(const tns_ops * ops, const char *data, size_t len); // Constructors for constant primitive datatypes. void* (*get_null)(const tns_ops *ops); void* (*get_true)(const tns_ops *ops); void* (*get_false)(const tns_ops *ops); // Render various types of object into a tns_outbuf. int (*render_string)(const tns_ops *ops, void *val, tns_outbuf *outbuf); int (*render_integer)(const tns_ops *ops, void *val, tns_outbuf *outbuf); int (*render_float)(const tns_ops *ops, void *val, tns_outbuf *outbuf); int (*render_bool)(const tns_ops *ops, void *val, tns_outbuf *outbuf); // Functions for building and rendering list values. // Remember that rendering is done from back to front, so // you must write the last list element first. void* (*new_list)(const tns_ops *ops); int (*add_to_list)(const tns_ops *ops, void* list, void* item); int (*render_list)(const tns_ops *ops, void* list, tns_outbuf *outbuf); // Functions for building and rendering dict values // Remember that rendering is done from back to front, so // you must write each value first, follow by its key. void* (*new_dict)(const tns_ops *ops); int (*add_to_dict)(const tns_ops *ops, void* dict, void* key, void* item); int (*render_dict)(const tns_ops *ops, void* dict, tns_outbuf *outbuf); // Free values that are no longer in use void (*free_value)(const tns_ops *ops, void *value); }; // Parse an object off the front of a tnetstring. // Returns a pointer to the parsed object, or NULL if an error occurs. // The third argument is an output parameter; if non-NULL it will // receive the unparsed remainder of the string. extern void* tns_parse(const tns_ops *ops, const char *data, size_t len, char** remain); // If you need to read the length prefix yourself, e.g. because you're // reading data off a socket, you can use this function to get just // the payload parsing logic. extern void* tns_parse_payload(const tns_ops *ops, tns_type_tag type, const char *data, size_t len); // Render an object into a string. // On success this function returns a malloced string containing // the serialization of the given object. The second argument // 'len' is an output parameter that will receive the number of bytes in // the string; if NULL then the string will be null-terminated. // The caller is responsible for freeing the returned string. // On failure this function returns NULL and 'len' is unmodified. extern char* tns_render(const tns_ops *ops, void *val, size_t *len); // If you need to copy the final result off somewhere else, you // might like to build your own rendering function from the following. // It will avoid some double-copying that tns_render does internally. // Basic plan: Initialize an outbuf, pass it to tns_render_value, then // copy the bytes away using tns_outbuf_memmove. extern int tns_render_value(const tns_ops *ops, void *val, tns_outbuf *outbuf); extern int tns_outbuf_init(tns_outbuf *outbuf); extern void tns_outbuf_memmove(tns_outbuf *outbuf, char *dest); // Use these functions for rendering into an outbuf. extern size_t tns_outbuf_size(tns_outbuf *outbuf); extern int tns_outbuf_putc(tns_outbuf *outbuf, char c); extern int tns_outbuf_puts(tns_outbuf *outbuf, const char *data, size_t len); #endif tnetstring-0.2.1/setup.py0000644000175000017500000000342411763273700015024 0ustar rfkrfk00000000000000# # This is the tnetstring setuptools script. # Originally developed by Ryan Kelly, 2011. # # This script is placed in the public domain. # If there's no public domain where you come from, # you can use it under the MIT license. # import sys setup_kwds = {} if sys.version_info > (3,): from setuptools import setup, Extension setup_kwds["test_suite"] = "tnetstring.test" setup_kwds["use_2to3"] = True else: from distutils.core import setup, Extension try: next = next except NameError: def next(i): return i.next() info = {} try: src = open("tnetstring/__init__.py") lines = [] ln = next(src) while "__version__" not in ln: lines.append(ln) ln = next(src) while "__version__" in ln: lines.append(ln) ln = next(src) exec("".join(lines),info) except Exception: pass NAME = "tnetstring" VERSION = info["__version__"] DESCRIPTION = "data serialization using typed netstrings" LONG_DESC = info["__doc__"] AUTHOR = "Ryan Kelly" AUTHOR_EMAIL = "ryan@rfk.id.au" URL="http://github.com/rfk/tnetstring" LICENSE = "MIT" KEYWORDS = "netstring serialize" CLASSIFIERS = [ "Programming Language :: Python", "Programming Language :: Python :: 2", #"Programming Language :: Python :: 3", "Development Status :: 4 - Beta", "License :: OSI Approved :: MIT License" ] setup(name=NAME, version=VERSION, author=AUTHOR, author_email=AUTHOR_EMAIL, url=URL, description=DESCRIPTION, long_description=LONG_DESC, license=LICENSE, keywords=KEYWORDS, packages=["tnetstring","tnetstring.tests"], ext_modules = [ Extension(name="_tnetstring",sources=["tnetstring/_tnetstring.c"]), ], classifiers=CLASSIFIERS, **setup_kwds )