In the past couple of weeks, his tweeting began to get out of control, and I wondered: at this rate, how many tweets will he send by Election Day? Taking what I have learned from following his Twitter feed and reading conservative news, I took a small amount of data and irresponsibly extrapolated it. Thus, Time Series Twitter was born.

Time Series Twitter is a small, simple web app – it simply loads the last 1000 tweets from a given user, graphs the time series plots for the tweets (by day), and does a linear regression on the tweets. With the linear regression you can over extrapolate and find the expected number of tweets a given user will send some time in the future.

For Hammer, this is currently 57 Tweets per day by the Election, significantly more than other news sources: @FoxNews is expected to send 6.3 tweets/day, @ESPN is expected to send 10.9 tweets/day, and @MSNBC to send -36.2 tweets/day. (All of these measurements are at time of writing – they obviously change a lot.)

This web app is nothing novel (or accurate), but it has been fun to watch friends’ Twitter activity over time. Give it a try.

]]>*[All of what I am describing is based off of my reading of Professor Sariel Har-Peled’s book on the subject. I suggest you read that (or other online notes) if you are really interested in the mathematical properties of well-separated pairs decomposition (or a range of other geometric topics).]*

Quadtrees are a fairly common geometric concept and also data structure for storing points in \(d\)-dimensional space. We assume all points are within some large cube or in 2-dimensions, as we will focus on, a square. We split the square into four equal sized smaller squares. We can do this again on each of the smaller squares and do this continuously (recursively). We consider each square to be a cell and the four smaller cells within it to be its “children.” A quadtree is a division of the square into smaller and smaller cells until every point has its own cell.

Therefore, we can search the space fairly easily and quickly. We can even plot non-spacial things in a spacial metric to make use of the data structure. For computer science-minded people, these cells are organized in a tree (why its called a quadtree), compressed for cells that don’t contain any points, and thus each leaf node contains one point. Searching the quadtree is thus fairly fast. The properties and uses of quadtrees, and its many variants, such as how accurate it is for nearest neighbor searches, are extensive and, again, can be read in Professor Har-Peled’s notes among other places.

The well-separated pairs decomposition provides a more specific method than quadtrees of characterizing the general distance between points in a large set of points without storing the distance for each of the \( {n \choose 2} \) combinations. Broadly, it does this by clustering points that are close together and only describing the distance between such clusters. The conditions for a WSPD are relatively simple, but require some mathematical notation. I will try to give a simple description below.

We say that the points are clustered into sets. The diameter of a set \(Q\) is $$\mathrm{diam}(Q) = \max_{p,q \in Q} \| p – q \|$$ where \( \|p-q\| \) is the distance between \(p\) and \(q\). In English, the diameter is the distance between the two points furthest from each other in the set. Two sets \(Q\) and \(R\) are \(1/\epsilon\)-separated if $$\max({\rm diam}(Q),{\rm diam}(R)) \leq \epsilon \cdot d(Q,R).$$Here $$d(Q,R) = \min_{q\in Q, s\in R} \|q-s\|,$$or \(d(Q,R)\) is the smallest distance between any pair of points from \(Q\) and \(R\), intuitively giving how close the two clusters are. Therefore, if two sets are \(1/\epsilon\)-separated then all of the points in one set are roughly the same distance from all of the same points in the other set, with some error relative to \(\epsilon\). Finally, a pair decomposition is a set of set pairs: $$\mathscr{W} = \{ \{A_1,B_1\}, \ldots \{A_s, B_s\} \}$$ such that:

- All points in any \(A_i\) or \(B_i\) are part of the original set of data points, \(P\): $$A_i,B_i \subset P$$
- For any pair \(\{A_i,B_i\}\) there is no overlap in points between the two sets $$A_i \cap B_i = \emptyset$$
- For any pair of
*points*\(p,q \in P\) there is at least one pair \(\{A_i,B_i\}\) such that \(p\) is in \(A_i\) and \(q\) is in \(B_i\)

Therefore, if we have a well-separated pair decomposition, then we have a pair decomposition where each pair of sets \(\{A_i,B_i\}\) are \(1/\epsilon\)-separated. The third criteria for a pair decomposition here is most interesting, simply stating that for any pair of points we can estimate their distance apart with the WSPD.

One algorithm to construct the WSPD climbs down the quadtree and uses the cells of the quadtree as possible set pairs in the WSPD. This is the algorithm I used in my implementation, as is shown in the Javascript [pseudocode] below:

function ConstructWSPD(cell1, cell2) { var arr = new Array(); if(cell1.diam() < cell2.diam()) { // Swap cell1 and cell2 var temp = cell1; cell1 = cell2; cell2 = temp; } if(cell1.diam() < epsilon * d(cell1,cell2)) { //The two cell's points are 1/epsilon-separated // This is a theoretical class SetPair which would link the two sets arr.push(new SetPair(cell1.points, cell2.points)); //My code actually uses the following statement to draw the relationship, rather than store it efficiently arr.push(new Dumbell( new PBall(cell1.points), new PBall(cell2.points) )); return arr; } if(cell1.children != null) { for(var i = 0; i < 4; i++ ) { arr = arr.concat(ConstructWSPD(cell1.children[i], cell2)); } } return arr; } // Here QuadTreeHead is the top cell in the quadtree: var wspd = ConstructWSPD(QuadTreeHead,QuadTreeHead); |

The code above recursively steps through the quadtree attempting to use different cells from the quadtree as sets for the WSPD. **Check out my small web application here to get some more intuition about how WSPDs relate to quadtrees as well as the impact of \(\epsilon\) on the WSPD.** There are many interesting properties and uses of quadtrees and WSPDs that I encourage you to read about further online or in Professor Har-Peled’s book.

On a completely separate note, this blog post was my first chance to use MathJax, which is awesome.

]]>For those who have not heard of the project, TerraNNI is a tool for taking LIDAR data, common for geographic information systems (GIS), and creating high-resolution grid digital elevation models (DEMs). Most GIS take DEMs as an input and can perform a wide range of tasks, such as flood mapping, line-of-sight computations, and city planning. The amount of LIDAR data available is growing quite fast. TerraNNI makes full use of this large quantity of data to efficiently produce large-scale, high-resolution DEMs.

In most of these cases, LIDAR data is 3D spatial data and we produced a 2D grid DEM where each grid point has some estimated elevation. Looking forward, we are beginning to see the rise of spatial-temporal data (4D) which offers the opportunity to perform more complex geographic studies such as change detection on terrains. Making use of new spatial-temporal data, TerraNNI provides efficient, scalable, high-resolution *volumetric* grid DEM construction. In this case, we produce a 3D grid, with axes *x,y* in space and *t* time; again each grid point has an estimated elevation. We expect this tool to become increasingly useful as LIDAR data becomes even more prevalent.

On a technical level, TerraNNI was an interesting project, incorporating computational geometry, GPU programming, and I/O efficient computing. The program performs an approximation of natural neighbor interpolation (NNI), which is based on the Voronoi diagram. It makes heavy use of the graphics card for most of the computations, using both OpenGL and CUDA. We also use TPIE for efficient disk access, since data sets can be much larger than the computer’s memory. TerraNNI is one of the fastest programs for producing high-resolution grid DEMs and one of the first to be able to perform 3D natural neighbor interpolation.

Please download TerraNNI from Github and give it a try for yourself. Let me know if you run into any problems. Also feel free to look at our two papers about the project

- Natural Neighbor Interpolation Based Grid DEM Construction Using a GPU (from ACM GIS 2010)
- TerraNNI: Natural Neighbor Interpolation on a 3D Grid Using a GPU (from ACM GIS 2011)

or at my previous blog post about computing Voronoi diagrams on the GPU.

]]>If you already know all about Voronoi diagrams or simply don’t care about the math and just want to play, jump to here or go straight to my Voronoi diagram generator application.

I would first like to briefly describe what precisely the Voronoi diagram is, along with how nearest neighbor and natural neighbor interpolation make use of this data structure. The Voronoi diagram is the division of plane into discrete Voronoi cells. Given a set of input points *S*, we would like to divide the plane such that for any given position in the plane, we know the nearest input point. This is the nearest neighbor problem. A Voronoi cell *Vor(p,S)* represents the region of the plane for which the given point *p* is the closest input point from the set of input points *S*. Mathematically the Voronoi cell *Vor(p,S)* is defined as

In the Voronoi diagram below, we color each Voronoi cell with a unique color. For the entire region covered by each cell, the input point within the cell is the closest point from the set of input points. You can also easily observe that the boundaries between neighboring cells is the set of points equidistant between two nearby input points.

Given this understanding of the Voronoi diagram, we can easily answer a nearest neighbor query. Given a point *q* at position *(x,y)* we need only check which Voronoi cell it is within to see which input point is closest to it. Thus, if each input point also has some height *z* we could roughly interpolate the height of *q* by taking the height of its nearest neighbor.

A more complex interpolation scheme, known as natural neighbor interpolation, is to take a weighted average of all of the nearby input points. This is done by inserting the query point into the Voronoi diagram and observing how much area the Voronoi cell of the query point steals from the neighboring Voronoi cells in the original Voronoi diagram. Specifically, the fractional weight assigned to each natural neighbor is the area stolen from that nearest neighbor’s original Voronoi cell divided by the total area of the Voronoi cell of the query point. Mathematically this interpolation can be defined as:

While more complicated than nearest neighbor interpolation, natural neighbor interpolation offers a higher quality interpolation and produces smooth results.

As explained briefly in the OpenGL Red Book, we can compute the Voronoi diagram easily in OpenGL by rendering cones at each input point and using the depth buffer to figure out the boundaries between Voronoi cells. Specifically, if we place a right angle cone with the apex at a point, the height of the cone at a given position is equal to the distance form the point in the *x,y* plane. As such, if we place all the cones in the plane and look at them from below, we will only see the lowest cone at any given point and thus only the cone for which the point it represents is closest. This is known as taking the lower envelope. (For more information, you can read Hoff et. al.’s paper on how this works.)

Therefore, by merely drawing the cones and placing the viewpoint appropriately within OpenGL, the framebuffer will store the Voronoi diagram. Naturally, when doing computations with OpenGL, the diagram will be discretized into pixels. Additionally, OpenGL can not draw cones, only triangles. Therefore, we approximate a cone with a *f*-sided pyramid. With a reasonably large value for *f*, we will not notice the effect of the discretization, but making *f* a small value like 4-10 can be interesting. Both of these discretizations of course create some deviation from the previously defined Voronoi diagram.

I have used this method, implemented in WebGL, the Javascript API to OpenGL, to create the following interactive Voronoi diagram generator.

While math is necessary to precisely describe these concepts, it is not always the best way to convey some of the nuances of the geometric behavior. As a result, one night, both to play with WebGL and to better understand Voronoi diagrams, I implemented some of these concepts in the browser. Below is a short screencast of me playing with the tool, which may be useful if either you don’t have a WebGL-enabled browser or you want to learn some of the features of tool; or just play with it for yourself.

Not much to say here. To make this small application, I used two canvases overlaid, one with WebGL and one with just the 2D-context. This allowed me to draw over the Voronoi diagram (such as the points). Following the learningwebgl.com tutorials, I used the Sylvester library for Matrix math as well as the glUtils library from the tutorials.

WebGL is still a little buggy and memory management can be an issue, among other things. But overall, it samples a good portion of OpenGL. Unfortunately, getting started can take a little effort. For example, unlike OpenGL in C++, you *must* write your own shader programs; I suggest using samples from learningwebgl.com at least to get started. However, once you have a functioning WebGL canvas, it is easy to pick up speed. I also went slightly overboard on the movement functions, but it was a good chance to play with Javascript closures.

And finally, here are some Voronoi diagrams I have generated that I find interesting, cool, or merely pretty.

Similar kinetic Voronoi diagram but with only 6-sided pyramids used.

Below, the same set of input points are plotted with two different types of cones. The first uses a 50-sided pyramid. The second only uses 6-sided pyramids, creating the jagged effect between neigboring Voronoi cells.

The following two Voronoi diagrams show the Voronoi cells and area stolen for a natural neighbor query.

]]>In doing my work I have come to have to write and run C++ OpenGL code on both OS X and Ubuntu. To make this easier, CMake has been great for both having readable make files and for being able to keep the files consistent across both platforms. Additionally, while GLUT is great for cross-platform OpenGL coding, I needed more control over the environment. Being able to run X11 on OS X has allowed me to test GLX natively without always testing on Linux. However, to get this set up on OS X without XCode can be a little tricky. So, two snippets of CMake files for using OpenGL on OS X and Linux.

If you want to run OpenGL with the standard OS X libraries, CMake generally finds the correct frameworks, but it can’t hurt to specify them more precisely. A basic example of the CMakeLists is shown below:

IF(APPLE) INCLUDE_DIRECTORIES ( /System/Library/Frameworks ) FIND_LIBRARY(COCOA_LIBRARY Cocoa) FIND_LIBRARY(GLUT_LIBRARY GLUT ) FIND_LIBRARY(OpenGL_LIBRARY OpenGL ) MARK_AS_ADVANCED (COCOA_LIBRARY GLUT_LIBRARY OpenGL_LIBRARY) SET(EXTRA_LIBS ${COCOA_LIBRARY} ${GLUT_LIBRARY} ${OpenGL_LIBRARY}) ENDIF (APPLE) target_link_libraries(main ${EXTRA_LIBS})

With this, your C++ should have the following includes:

#include <GLUT/glut.h> #include <OpenGL/glext.h> #include <OpenGL/gl.h> #include <OpenGL/glu.h> |

If, however, you are looking to run your code also on a Linux machine or simply want to use GLX rather than CGL, you will need to have X11 installed (which your OS X installation disk contains) and specifically point CMake at the X11 libraries. The CMake code for this is shown below:

include_directories(/usr/X11R6/include/) link_directories(/usr/X11R6/lib) SET(EXTRA_LIBS GL X11 GLU glut) target_link_libraries(main ${EXTRA_LIBS})

When using the X11 libraries, you must also change the includes as the folder structure for X11 is different than that natively for OS X:

#include <GL/glew.h> #include <GL/glut.h> #include <GL/glext.h> #include <GL/gl.h> #include <GL/glu.h> #include <GL/glx.h> |

That is all. Not terribly complicated, but hopefully this saves you some time if you are doing development with CMake, OpenGL, or X11. Look for some new slightly more exciting posts soon.

]]>ffmpeg -i screencast.mov -f avi -vcodec mpeg4 -sameq -ac 2 -ab 128k screencast.avi

It is necessary to specify a video codec because AVI can be used with multiple codecs. To maintain video quality use -sameq. When converting screencasts, it may also be useful to crop out parts of the screen since Quicktime only does full screen capture. For this, you can use the -croptop, -cropbottom, -cropright, and -cropleft arguments. Last, make sure the arguments come *before* the output file; otherwise they are not applied.

Maybe I will later do a more full post of the different ways I use FFmpeg. But for now, that is all.

]]>Additionally, earlier in the year I worked with Duke’s ITSO to write up examples of good coding practices to protect against a variety of web application security issues. This referenced is linked on Duke ITSO’s site here: http://www.security.duke.edu/ITSO_Web_Application_Security_Standard_v1.pdf

]]>To do this I used examples from here for the new File API and here for the AJAX upload. Obviously this could be expanded upon by implementing it in all three file views in Webfiles and by showing progress bars for each file. Additionally, there are some small bugs with uploading large files. However, if you are a Duke student and want to give it a try, follow the instructions below. Please note, this is a alpha version of the software, and you may run into some bugs when using it.

- Make sure you are using Firefox 3.6 Alpha – Namoroka.
- Go to our development version of Webfiles and log in.
*After*you are logged in, go here to turn on the new drag and drop feature.

Please, give it a try and let me know your thoughts. Thanks.

]]>I am on fall break now, and will be spending the weekend at home. Hopefully I’ll do something cool while I’m home and will post if anything comes out good.

]]>