Tuesday, November 30, 2010

Le cadavre exquis boira le vin nouveau

I'm currently reading "Fooled by Randomness" by N.Talib, and I'm really enjoying a chapter dedicated on randomness in humanities... So let me start with two quotes:

" 'Reality is part of the dialectic of consciousness' says Derrida; however, according to Scuglia[1] , it is not so much reality that is part of the dialectic of consciousness, but rather the absurdity, and hence the futility, of reality."

"Sound is the change in the specific condition of segregation of the material parts, and in the negation of this condition; merely an abstract or an ideal ideality, as it were, of that specification [...]"

Very exciting. Now you have to guess: one of the two quotes is from Hegel, and the other one has been generated from a computer using the Postmodernism generator, created by Andrew C. Bulhak using the Dada Engine. (you can find the whole article I've created here, and if you want to make your own paper just click here).
Now, it's all fun, and if you know some philosophy you've probably guessed right (Hegel is the second one, also because it's very unlikely that he've quoted Derrida which was born 101 years after his death...). The point is, with another random combination (without Derrida being quoted) could someone have fooled you? To be honest I'm not a philosopher nor a postmodernist and I'll be concerned if you fool me with a scientific paper on spatial analysis rather than this (it would be nice to actually have a positivist generator to see how it would work...). I simply don't care much, but actually Alan Sokal, a physicist  of New York University, tried to submit a paper on Social Text by randomly creating (not with a computer this time) meaningless sentences... Well you can read his accepted and published paper here....Later in the same year, Sokal explains his point in another paper, (this time generated non-randomly I think):

"For some years I've been troubled by an apparent decline in the standards of intellectual rigor in certain precincts of the American academic humanities [...]. I decided to try a modest (though admittedly uncontrolled) experiment: Would a leading North American journal of cultural studies -- whose editorial collective includes such luminaries as Fredric Jameson and Andrew Ross -- publish an article liberally salted with nonsense if (a) it sounded good and (b) it flattered the editors' ideological preconceptions?"


and then states:

"Social Text's acceptance of my article exemplifies the intellectual arrogance of Theory -- meaning postmodernist literarytheory -- carried to its logical extreme. No wonder they didn't bother to consult a physicist. If all is discourse and ``text,'' then knowledge of the real world is superfluous; even physics becomes just another branch of Cultural Studies. If, moreover, all is rhetoric and ``language games,'' then internal logical consistency is superfluous too: a patina of theoretical sophistication serves equally well. Incomprehensibility becomes a virtue; allusions, metaphors and puns substitute for evidence and logic. My own article is, if anything, an extremely modest example of this well-established genre." (bold mine)

Scary, but perhaps true also in some archaeology and anthropology? While asking this question I've just read on Carl Lipo's blog how the executive committee of the  American Anthropological Association  are proposing to rephrase  their mission statement removing the word "science" (see here).

Friday, November 12, 2010

UPDATES: Simulpast, STDM and paper

Lot's of things happened the last few weeks so I need to update you on couple of things...
I've been in Barcelona last week and spent a wonderful time with the folks of the Spanish Research Council, where they allowed me to present two papers related to my PhD. They are just going to start a very exciting project called simulpast from early next year:
I quote from their website:

"The aim of the project is to develop an innovative and interdisciplinary methodological framework to model and simulate ancient societies and their relationship with environmental transformations. The propject will include 11 Research Groups of 7 different Institutions with more than 60 researchers from several fields (archaeology, anthropology, computer science, environmental studies, physics, mathematics and sociology). The leader institution is the IMF-CSIC in Barcelona."

I've never heard of any archaeological project centred on computational modelling having such broad range of case studies. This is a great opportunity and I'm really looking forward on the project outcomes. And I guess this can also be a great leap forward in terms of terms of standardisation and communicability of models. Good Luck and Thanks for the Tapas!!!

***

In the mean time I'm keeping myself (and Mark Lake) busy, as we are working for a paper which will go deeper on some of the topics we've explored for the CECD conference this September. We'll mainly focus on cultural transmission models of fitness-enhancing traits (2 and n-traits) with frequency dependency of the fitness and different types of Carrying Capacity (shared and independent) looking at short term dynamics and long term equilibrium of adoption rate and trait diversity. Stay tuned for more info!!!

***

The last but not the least! I'm quite excited, since there will be a International Symposium of Spatio-Temporal Analysis and Data Mining hosted here at UCL on July!!! This is a great chance to see many advanced techniques in spatio-temporal analysis and simulation which might give us some new perspectives in archaeology!!!

Wednesday, October 13, 2010

Inference From Confirming Evidence

Every once in a while I read papers in Archaeology claiming about there allegedly scientific methods failing to choose the right type of hypothesis testing for their models. While reading Taleb's "The Black Swan" I come across a very nice  psychological experiment conducted 50 years ago by Wason (1960).
The basic idea is that you have a data-set with a specific pattern, and in order to explain the underlying rule you conduct a series of experiments to test and propose a model. Of course in archaeology you cannot strictly do experiments all the time, but you can look for other data which will support or not support your model.
Now the experiment is based on a simple sequence of number, and one should simply "discover" the underlying rule. The player can propose another sequence of numbers and the experimenter will tell you whether such sequence can been generated from the same algorithm of the original data-set or not.

The sequence is:

2-4-6

Now, most people will most likely propose something like following sequence:

8-10-12

which will basically test the model increase by two. Now if the experimenter will tell you "Yes", you'll be probably quite happy about that, and probably you will write a paper for the Journal of Integer Sequences, which believe it or not, actually exists) with something lines:
"Our experiment confirmed our hypothesis of the increase by two rule". Very few will claim that this model is wrong and will be part of the scientific knowledge of your field (I'm sure that none of the editors of Journal of Integer Sequences will accept your paper, nor will fall on this trap...)
Now the big problem here is that the algorithm with which 2-4-6 has been generated was numbers in ascending order. Thus actually 5-6-7 would also have been accepted by the experimenter. However, it's vert likely that most people uses these experiments to confirm their model rather trying to falsify it. Testing 5-6-7 would have allowed a re-evaluation of the originally proposed rule and might have lead to the right answer.
The question now is: what sequence of number are you proposing in your archaeological research?


References:

  • Taleb, N.N.,2007, The Black Swan: The Impact of the Highly Improbable, Random House, 
  • Wason, P.C., 1960, On the failure to eliminate hypotheses in a conceptual task, The Quarterly Journal of Experimental Psychology, 12: 3, 129-140.

Wednesday, September 1, 2010

apply() function and ABM in R

I know know...I've been away again...
We (myself and Mark Lake) are presenting a paper at the CECD conference and we have still some to stuff to finish...so I'm really, really busy... I'll post asap a much more detailed post on the conference and on our paper, but before that I just wanted to share an useful link I found this morning which would have been handy a couple of month ago.
As I said, I'm writing my ABM in R this time. There are many good reasons (but also bad reasons) for this which I'm gonna write on another post (yes I keep promising...). Having said that, R is terribly slow. Yes  you can write things in C and call them inside your function, but the main reason is that it is terribly bad on looping. And an ABM involves a lot of looping. Then I realised that many people avoid using  loops in R, and instead use the series of "apply" function. These are however hard to grasp, but this blog explains it very elegantly. I wish I had read this long ago...Anyway, when you master the "apply" family, you can also play around with the mcapply() function of the multicore package which parallels the apply function through your cores, fastening alot your simulation!

Thursday, August 12, 2010

London Cycle Hire Scheme and Flow Analysis

If you are living in London, you'll probably noticed all these new blue bikes of the London Cycle Hire Scheme. It's a brilliant idea as the number of fellow cyclist will increase more and more, hopefully gaining some more respect from the people sitting in those tin boxes.
Anyway. I was wondering how did they managed to calculate correctly the flow of cycles, so that  you always have some of them available in your station. The obvious guess is that there will be cycles of flows towards the city centre and from the city centre, as most people will be using these for commuting. Having said that, Oliver O'Brein of CASA has created a web-GIS which shows you the currently available number of bikes at each dock with a time-series of bike availability through the past 24 hrs at each location. He also made a video which nicely shows the inward and outward flow. Really cool. You can find more details on his blog.

Friday, July 30, 2010

Communicating (Agent-Based) Models

So, you've worked hard and finally you have a working ABM, full of features and parameters and you are super-excited and want to show everybody your latest phase space or time-series. You go to a conference or you write a paper, in any case chances are that people will misunderstand your model, or will simply take for granted it's underlying algorithm and there will be very few questions...especially if your audience is not trained....
I think that one of the biggest problem archaeological (and non) ABM must face is when your model reaches it's audience and you have a small window of time (10~30 minutes) and/or space (3000~5000 words) to communicate all the algorithms and submodels you've used.  Now even in an ideal (?) word where everybody understands Java, C++ or NetLogo (or even R in my case), few people will have the patience and the willingness to go through your raw code and try to understand how really your model works. Most people, will simply look at your conclusions, or read through the old-fashioned text-based way of communicating your model. The problem then is how you evaluate other people's model. Well the short answer at this stage is that you could but you won't. And the risk is that we lose the scientific feedback process, bringing us back to simple story-telling with perhaps some fancy dynamic illustrations....
The thing is that, I'm much convinced that greatest achievement that archaeology can gain from ABM, is not the actual bunch of codes and files, but the formalisation of submodels. We tell stories and we tend often to avoid details in the non-computational modelling process. We delineate the larger trends without tackling the smallest issues. This epistemological laziness  (as one of my supervisors would call) is however strictly prohibited in an ABM. Or to better put, you can still place lazy models, but people will discover this and criticise it..but only if your model, submodels and algorithm are well communicated.
So the communication problem is really a big issue, and the risk is to be trapped with a series of over-complicated hyper-realistic models, with very long codes that nobody will ever read and check...

I'm having a series of nice chats with Yu Fujimoto, a visiting scholar from the Faculty of Culture and Information Science at the Doshisha University in Japan. The discussions are around whether models should be communicated through UMLs (Unified Modeling Language)  or using the ODD (Overview Design Concepts, and Details) Protocols advocated by Volker Grimm or simply by series of pseudo-codes. All modes of communications are around, but not common in archaeology. One reason is that a non-trained archaeologist will struggle to understand pseudo-codes, and will definitely reject UML as something mystic and unquestionably complicated. This leaves ODD, which hopefully will take over. Ideally journals should allow the upload of the source code and also an additional appendix with the ODD description of the code, leaving aims&objectives, brief description, experiments results and discussion as the core elements of the paper. Of course, having said that, the problem of model communication in conferences remains tricky, as going through the ODD will most likely use the entire time-block and you'll hear the 5 minutes bell ringing as soon as you reach the second D....

Monday, July 26, 2010

R Plotting tips

Yes...I haven't been updating the blog for a very long while...But I'm really busy with many things right know.... Our paper for the "Cultural Evolution in Spatially Structured Populations" have been accepted so we are currently working on that, and I have also stuffs for the PhD and a couple of papers I'm working on...busy busy busy.....
I've also started writing a couple of ABM using R, which sounds crazy at first (also at second, and third) but it has some nice things which I'll write about extensively in a future post.
But for know, I just wanted to start a series of very small posts (mainly for archaeologists) of small tips, which are astonishing simple concepts which however takes a couple of hours of googling and forum foraging...
For instance, have you ever tried to plot a time series of BC or BP dates? Suppose you have a sequence of count per century as follows:

data<-c(789,100,923,444,224,192,83,45,32,21,19,22,23,42,120)

plotting this as a timeseries is very simple:

plot(data,type="l")

and then you realise that you want something meaningful on the x-axis and you write the follow

dates<-c(3500,3400,3300,3200,3100,3000,2900,2800,2700,2600,2500,2400,2300,2200,2100)

 perhaps, if you know a bit of R you'll choose the more elegant nested function

dates<-sort(seq(2100,3500,100),decreasing=TRUE)

In any case you'll try to plot this as follow:

plot(x=dates,y=data,type="l")

and you'll find out that R ignored the ordering of the vector dates, and it even reversed your time-series.

My practical solution was to use negative values on the plot, and then delete the "-" with gimp or something (yes I should really be ashamed of myself).
Well for the small portion of people who had the same problem and here's the solution

plot(x=dates,y=data,type="l",xlim=c(max(dates),min(dates)))

Basically you can tell to the plot function that the range of values for the x axis is from the greatest value (the oldest date in our case, thus the largest number) to the smallest value. R will simply then read the values of dates in the correct order and plot the TS in the way it should look like.
Easy.

Monday, April 5, 2010

Numerical Systems, Logarithmic Scales and Cognitive Spaces

This morning I came across this article on the Guardian which is basically an extract of Alex Bellos' book Alex's Adventures in Numberland. I haven't read the book yet, but the extract looks really promising. To put it short, Alex Bellos refers to the work of the french anthropologist Pierre Pica and his study on the Munduruku , an amazonian tribe which has a numerical system restricted to 5. A part from the intriguing insights on the origins of the numeric intuitions, what really fascinated me is an experiment conducted by Pica (Dehaene et al. 2008) which links numerical systems to spatial perception. Imagine a straight line with one dot on one edge and 10 dots on the other edge. Members of the tribe were asked to place a given set of random dots between 1 and 10 along any point along the line. If you do this experiment yourself, you would most probably put a set of 5 dots in the middle of the line, and a set of 7 dots somewhere near the 3/4 of the line. If you then plot your results in a xy plot (with x=number of dots in the sets and y=distance from the origin ) you would probably achieve a linear relation with a 45 degree slope (y=x).  Now the results of the Munduruku were quite different, the relation is in fact not linear but logarithmic, the greater is the number of dots the smaller was the difference between sets. More interestingly, experiments on western children have shown how they do have the same "logarithmic thinking" which is however lost when they get old. Dehaene and colleagues rightly points out the evolutionary implications of this discovery and they also point its relation with Weber–Fechner law, while Alex Bellos acknowledges its implications in a wider context, among which how logarithmic perception might affect perspective. Now the combination of Weber's law and  perspective has a very interesting implications in spatial cognition. Put it simply Weber's law states that relation between stimulus and perception is logarithmic. Suppose you want to buy some milk, and there are two stores in close distance, one at 100 meters and the other one at 200 meters. You would definitely choose the first one. Suppose now that the two stores are at 10,2 km and 10,3 km. Now, despite the difference between the two are the same (100 meters) you can easily go with the second stores without worrying to much. The reason is that when we compare the two we don't look at the absolute values but at their ratio. In the first case store B is the two times the distance to store A. In the second case, the ratio is 1.009, so it really doesn't make that difference for us. What are the consequence of this? Well most models of spatial perception are based on linear relation between stimulus and perception. This linear relation is assumed in models of spatial interaction which determines the backbone of many spatial analysis, and at the very same time, most agent based models do the very same thing. Suppose a very simple model, where an agent choses a location within a search neighbourhood based on some cell value. Distance and perspective really does not play any role in most cases: the agent choses the cell with the highest value. If we integrate the cost of movement the choice will be distance dependent, but there will be still no difference in the cognitive aspects of the decision making at different distances. What we really need to do is to integrate Weber's law in our submodels, so that the relation between environment stimulus and the actual perception becomes logarithmic. The results is that difference between objects at close distance have a stronger effects in the decision making process than the difference between objects at far distances. A quite intuitive concept that however, as far as I know (and I'll be glad if somebody proves that I'm being wrong) hasn't been integrated in computational and statistical models of human spatial cognition.

Saturday, March 27, 2010

Conference on Cultural Evolution in Spatially Structured Populations

Wow. This year is really amazing for me here at UCL.
We had the Institute of Archaeology Monday Seminars titled "Contemporary Roles for Spatial Analysis in Archaeology" , which was a huge success with many interesting papers from a very wide perspective (from ABM to Time GIS, passing through 3D models).
We organised the UK Chapter of Computer and Quantitative Applications in Archaeology, again here at IoA in UCL. The conference was really in an enjoyable atmosphere with a wide range of papers from Network Analysis to Multinomial Logistic Regression and a session entitled "Modeling, GIS and Spatial Analysis".
Now, the AHRC funded Centre for the Evolution of Cultural Diversity has just made its call for papers for a three day conference on Cultural Evolution in Spatially Structured Populations . Notice that my blog is titled Evovling Spaces, and the conference is on Cultural Evolution in Spatially Structured Populations, so you can understand how much I can be excited about this. There are already a lot of confirmed speakers, and I can see already the themes unfolding from that: Maximum Entropy Models (e.g. Tim Evans, Alan Wilson), Spatial Diffusion Models (Anne Kandler, James Steele), Niche Construction Theory (Kevin Laland) , Scaling (Alex Bentley) and Agent Based Models (Luke Premo, Tim Kohler)... just to mention some of them...
I can't wait for September!!!!



Thursday, March 25, 2010

NetLogo & R Extension

I'm really a heavy user of R and was so much before starting to do any agent based models. So the first thing I was looking in any software package for ABM was some automated link to R (much like spgrass6 for GRASS and R for GIS). 
I thought  Repast Simphony was the way to go, since the website claims about capabilities to work along with R, but then I was disappointed to find out that it was only storing the output in a data.frame class object (and besides it does not work on a  Mac…). Then after switching (at this stage almost completely) to NetLogo, I found this awesome extension, currently in beta stage (and alas, still not working on yet on a Mac…) but as far as I've seen it works perfectly fine. 
The NetLogo-R-Extension developed by Jan C. Thiele and Volker Grimm (one of the authors of the ABM textbook of the previous post) links seamlessly NetLogo to R through the rJava package. This means that you can do much more than exporting your results in a data.frame class object: you can call R while your simulation is running!!!! So for instance you can plot the L-function of your agents' spatial distribution on the fly while the simulation is running (see the picture below). But this is just the tip of the iceberg! Since you can virtually call any command in R while running your model, you can save plots in any folder, link to Databases (I haven't tried yet, but I guess it's possible), and virtually do any analysis you would like to do and store it in a .Rdata file!!!



Example of a NetLogo ABM with continuos plot of the L function (along with the Confidence Envelope)

Wednesday, March 10, 2010

Importing Raster Files in NetLogo without the GIS Extension

I've been working mainly on abstract ABM, and I really don't need all that bells and whistles, and besides I've had some problems when trying to handle the relation between the real coordinate system and NetLogo own system. What I want is to use a GIS (GRASS in my case) to create artificial landscape with some given statistical or geostatistical proprieties that I can import in NetLogo. There is only one problem for this: NetLogo identifies the integer coordinates at its centroid and not on the bottom-left corner. This means that a world with max-pxcor and max-pycor set to 1 have actually four cells, with the coordinate of their centroids at 0,0; 0,1; 1,1; and 1,0. Thus in reality the absolute extent of the world is -0.5 to +1.5, with the actual cell size maintained to 1. Now if you create a map in GRASS with cellsize 1 and extent max x and max y at 1 you will have only 1 cell.   


So here's my solution:
0)Set up a GRASS region with 0,0 origin on the bottom-left.


1)Create a raster model in GRASS and import in R as a SpatialGridDataFrame (via maptools or spgrass6; you can of course also create a model in R directly)


2)Export as text file using the following code (where "data" is the SpatialGridDataFrame):


#Extrapolate the coordinates of the cell centroid

x<-coordinates(data)
#Shift the coordinate by half the cell size
x<-x-(data@grid@cellsize[1]/2)
#convert matrix to data.frame
x<-as.data.frame(x)
#export the data value (data@data) as the third column of the data.frame
x[,3]<-data@data
#write the output into a text file with tab separator, without putting any column name
write.table(x,"data.txt",sep="\t",col.names=FALSE,row.names=FALSE)



If you look at the .txt you'll see something like this:

0    49    120
1    49    110 
...

3)Read the text file in NetLogo. The procedure is adapted from what Railsback and Grimm have written in their forthcoming book on ABM which is simply GREAT and I advise to everybody.


to import-rast [filename]
file-open filename
  while [not file-at-end?]
  [
    let next-X file-read
    let next-Y file-read
    let next-Z file-read
     
    ask patch next-X next-Y [set mapValue next-Z ]
  ]
    
file-close

end

so I use this in my setup procedure putting something like:

import-rast "mapA.txt"

with mapValue being a numerical property of the patch which I am interested in. Be sure also to set the NetLogo world with origins at the bottom-left, and use as maxpxcor and maxpycor the maxX and maxY of your Raster minus the cellsize (e.g. for a 50 x 50 Raster map with cellsize 1, maxpxcor and maxpycor should be 49). Oddly, if your world is to small, there will be no error message but just a portion of the raster model presented in NetLogo.
Again this workflow is fine as long as your model is abstract. You can of course use this for real world raster data, but you need to shift the entire data so that you have your origin on your bottom-left, or you can use the GIS-Extension.

Sunday, February 28, 2010

CAAUK 2010

About 10 days ago we (IoA UCL) hosted the UK Chapter or the Annual Conference on Computer and Quantitative Applications in Archaeology (CAAUK for friends). You can still visit the website here, and hopefully we'll post some video of talks...
I've been a couple of times to CAA International and this was my first time for the UK Chapter, and I have to admit, I really enjoyed the papers, the people and the overall mood.
I've also presented a paper titled : "Quantifying and Integrating Temporal Uncertainties in Archaeological Analysis: Methodological and Theoretical implications" which is basically following the line of my paper on Journal of Archaeological Science (which is by the way, finally in press with the corrected proof version here). This time, instead of focusing on the implications of temporal uncertainty in spatial analysis, I've made a step back and discussed about its implications in general terms by showing a very simple case study where the change in the number of pithouses over time was sought. In theory something very trivial, but in practice quite tricky, especially when you are not really sure about the dates of the pithouse construction. Anyway, I'm writing a paper on this so hopefully you can read more details about it soon.

Intro

Ok...So here we go. It's a bit difficult to start a blog so I'll start by introducing myself.
My name is Enrico Crema and I'm a PhD student at the Institute of Archaeology, UCL. If you ignore my AA-like introduction and wish to know more about me, please visit my homepage here. My doctoral project seeks to study the evolution of prehistoric humter-gatherer settlement pattern in Jomon Japan, to do this I'm exploring a pretty wide range of fields and topics...To list, I'm interested in Spatial Analysis, Agent Based Modelling, Human Behavioral Ecology and Dual Inheritance Theory, and more in general everything about Evolution, Space and Human Behaviour (which is basically anything you can think about...)...Hope you enjoy as much as enjoy writing the posts.

E.