Same Old Stuff : a procrastinator's notebook

Thursday, March 04, 2010

Web Services Interoperability Soup

Some notes on .net - j2ee and cros-j2ee web services integrations :

* Compatibility stepping stones : interaction styles, data types, namespace issues.
* wsdl standard - 1.1 vs 2.0 ?

axis2 vs jax-ws 2.0:

1) generated wsdl styles:

- namespace : wsdl-namespace for axis2 vs flat-namespace for jax-ws
- axis2 lacks name property in tag - introduces definitions/documentation tag
- endpoint vs port (though they should be equivalent)
- axis2 - direct schema in wsdl vs external reference for jax-ws
- elementFormDefault="qualified" property in axis2 schema / missing in jax-ws
- for base types, schema seems similar
- separate xsd is often more "natural" setting
- jax-ws misses Action properties for input/output of each operation
- axis2 generates multiple ports - HttpEndpoint / HttpSoap11Endpoint / HttpSoap12Endpoint - depending on the version - 1.1, 1.2, 2.0 - jax-ws assumes default version (1.1 ?)

Monday, December 15, 2008

cURL out of memory on Xen instance (use and abuse series)

cURL, a part of everyone's favorite UNIX tool subset, got me into a bit of trouble recently, while trying to post a relatively large file, following a common 'just curl it' - logic (so commonplace that a lot of major projects simply incorporate curlin' as a part of standard deploy procedure).

The case was posting 8Gb file on 16Gb Xen instance. While this worked quite nice on real box, on virtual box curl said hello with :

out of memory

Now that seemed quite bizzare. Figurng out that the process actually gets ENOMEM, it was logical to look at curl code and figure out what's going on.

And there it was, power-of-two allocator in the file read loop :

static ParameterError file2memory(char **bufp, size_t *size, FILE *file)
...

char *newbuf;
char *buffer = NULL;
size_t alloc = 512;
size_t nused = 0;
size_t nread;

do {
if(!buffer || (alloc == nused)) {
/* size_t overflow detection for huge files */
if(alloc+1 > ((size_t)-1)/2) {
if(buffer)
free(buffer);
return PARAM_NO_MEM;
}
alloc *= 2;

if((newbuf = realloc(buffer, alloc+1)) == NULL) {
if(buffer)
free(buffer);
return PARAM_NO_MEM;
}
buffer = newbuf;
}
}

Whoa :) - now apparently someone didn't expect some geniuses will try po post XX Gb files with curl - so it's the abusers that are to blame. Stop abusing curl and do your own posts !

However, if you don't have the time to change your app, and still want to post files of the size (N,2N) Gb on a 2N Gb box, a simple hack of given form should do it :

if (alloc < ALLOC_THRESHOLD)
alloc *= 2;
else
alloc = alloc + ALLOC_THRESHOLD;

(Where ALLOC_THRESHOLD would usually be 1Gb)

This should make allocation linear, rather than exponential, once the allocated memory passes given threshold.

Now - what does is all has to do with XEN, you might ask?.

Couple of things, actually. First off, such environment (local or any virtualized cloud platform offering xen instances) usually provide user with something like effective 2^N - penalty memory space (say 15Gb instead of 16Gb) - and that's where the impact of power of two allocator becomes apparent much sooner. Also - memory allocation policies are quite stricter and enomems are dispatched much earlier, oom killer is fast on the trigger, etc :) - so that's why the curl ooms immediately, rather than trying to make that darn realloc() after all.

Moral of the story - don't abuse standard unix tools !
Be nice to curl - do not POST binary data larger than 50% of effective RAM.
Keep it safe !

Thursday, July 10, 2008

Instrumenting Java code for fun&profit

By code instrumentation we assume a proces of adding bytecodes to methods in order to intercept their execution (usually for the profiling purpose)
Some of the purposes include : code tracing (getting the method calls and regenerating the call tree), code profiling (getting the execution times between calls, automatic detection of bottlenecks etc), code monitoring (event/method invocation detection, code state monitoring - getting the structure size etc), and all of the above(for eaxmple, monitoring performance dependent of the code state, size of structures, number of predefined objects, etc)

The java.lang.instrument interface, available since java 1.5, enables us to write a handler,acessible via agent loaded into JVM, to which, all class load requests are passed. This enables us to calculate time,performance, analyze state, or even instrument the classes loaded.

the instrumentation is done via modification of method bytecodes

Creating a instrument agent is pretty straightforward :

- the Premain-Class is defined => which must implement premain method (taking a args string, and a Instrumentation object instance,which is created by jvm and automatically passed to the premain method of Premain-Class). The Instrumentation instance is then passed on, and can be used to monitor list and usage of all loaded/initialized classes, and add/remove transformers and redefine classes).

.....

- what we need to do next in order to reconstruct the full call-tree trace, is the information when which method ends. We cannot get this information via classloader, especially not in execution-time (we might try to get some post-exec via call stack reconstuction). However - we can get the return info the other way -> and that is via mangling the bytecode using the transformation framework and BCEL bytecode engineering library. We are simply altering the bytecode either via creating a wrapper method or by notifying the static method from the added instruction before the final return in the method. All in all this gives us all the necessary ingredients for making a proper runtime-profiler-call-dinamic-executinon-tracing-whatever :) Anyway, once i get it all together and start writing decent posts, i might write a proper post about it :)

References :

http://java.sun.com/j2se/1.5.0/docs/api/java/lang/instrument/package-summary.html
http://www.javalobby.org/java/forums/t19309.html
http://www.ibm.com/developerworks/java/library/j-jtp09196/index.html
http://www.cs.nuim.ie/~jpower/Research/instrument/
http://www-128.ibm.com/developerworks/java/library/j-jip/

Wednesday, June 25, 2008

Profiling java apps using DTrace

- monitoring operational performance of java apps
- finding bottlenecks
- examining cache effects
- example : heavily IO-bound app
- OS/X / Solaris

idea :
(a) current java profiling options : (what/where)
(b) java options on system/level profiling (we can debug / but cannot to the strace-like detailed to-syscall-mapping, oprofile-like stuff etc)

dtrace - cool caps :

references :

http://www.devx.com/Java/Article/33943/0/page/3
http://www.solarisinternals.com/wiki/index.php/DTrace_Topics_Java
http://developers.sun.com/solaris/articles/java_on_solaris.html
http://developers.sun.com/solaris/articles/dtrace_ajax.html

Saturday, March 29, 2008

Managing shared state in Erlang

Though a functional language, with no apparent shared state, we can trivially implement state in erlang by (ab)using the single-bounded value concept (each value can be bound only once)

whereis(list_to_atom("portServer" ++ integer_to_list(PORT))).

Bloom filters for profit & fun

Perfect Hash Functions for fun and profit

Thursday, March 27, 2008

Breakpoint

Okie, cut here

This is another attempt into actually starting to write something ;)
So, stay tuned for a bunch of stupid posts :) I need something in order to get started. Hopefuly, I might be able to actually post something usefull some day. (in which case i will erase all the posts until that day :) ). So, some value after all .... these posts won't last forever .... :) :) read them while they are still here ... and have mercy :)

Monday, December 31, 2007

Linux buffer cache & how to disable it (and why ?)

Linux buffer cache provides a excelent mechanism for black-box performance optimization by a modest cost or max 2 memory hits (hit 1 -> page not found -> fetch buffer from disk to buffer cache (free heap space) -> hit 2 -> found in memory -> get block).

However, there are 2 cases when we would want to disable such behaviour :

(1) When 2-hit is too much of price to pay, we might want to think about direct io (O_DIRECT/ madvise()-style), and doing the memory buffer management by hand from userspace - often done for db cache management

(2) When we want to do unbiased benchmarking of heavily IO-dependent software (usually a single shot-benchmark is ok - pages are on the disk), but upont 2nd take , most of the pages remains in buffcache, which results in better performance, so no real metric can be imposed afterwards - so the only way to do the proper benchmark would be to disable buffacahe

Currently, we are interested in (2) , so here are couple of ideas how that could be done :

(a) create a non-trivial file of available memory size and write a simple code that mmap()'s it
(b) tune the swappiness kernel knob (/proc/sys/vm/swappiness) to 0 (proc memory over buffers), and fork() some ~64k dummy (nontrivial) processes :) - this should do the trick
(c) mounting the partition as raw device (no buffering then) - but this is usually highly impractical
(d) seting the O_DIRECT flag for every open() in the source (this is often tedious unless open() is invoked through a wrapper - a nice argument for doing so in such applications). We could write a simple wrapper (if possible for doing so):

int dopen(char *file) {
return open(file, O_DIRECT);
}

or in case of fopen() :

FILE *dfopen(char *file, char *pern) {
int fd = open(file, O_DIRECT | O_RDONLY);
return fdopen(fd,perm);
}

(note theat O_DIRECT is conditionaly defined by _GNU_SOURCE , so don't forget to use -D_GNU_SOURCE flag when building)
(e) reboot the machine (if you are really desperate) :)
(f) allthough one might think that invoking "sync" from command line might do the trick - it actually just flushes *changed* blocks to the disk - which is not actually what we need (flushing the read-buffered blocks)
(g) the dirty way of doing (a) would be simply touching a very big file (for example dd if=/dev/urandom of=file_4GB bs=1024 count=4000000) and doing 'cat file_4GB > /dev/null'
(h) writing a simple code that allocates a huge chunk of memory and locks the allocated pages (using mlock() or similar) - thus reducing the available buffcache to a arbitrary size
(i) doing fcntl() with F_NOCACHE on all file descriptors in the source code - again quite tedious especially if there is no wrapper for open() call in the code
(g) using madvise() mechanism for telling kernel that the pages allocated won't be used in the future (which should result in kernel freeing the allocated pages from buffcache immidiatelly):

size_t dfread(void *ptr,size_t size,size_t nmemb, FILE *stream) {
size_t n;
n = fread(ptr,size,nmemb,stream);
madvise(ptr,size*nmemb,MADV_DONTNEED);
return n;
}

(h) if we're to just flush the entire buffercache, on the 2.6.16+ kernels we could use a a "drop caches" mechanism to free all pages from buffcache :

echo 1 > /proc/sys/vm/drop_caches