performance of mmap() against NFS

blf0 · March 31, 2009, 12:56am

Hi all.

I got a performance related problem that bothers me a lot.

we have two computer nodes(svr1 & svr2), and svr1 has mysql 4.1.20 installed as well as nfs, and svr2 can mount disk from svr2, so we can make it possible to share files between them...

the data files we'll deal with can often reach GB(or even TB) level, for this instance, we got a 4.17GB seismic data file , the job is to read data from it as input and output the data to the shared disk.

in order to improve the performance we use memory mapping instead of traditionally read/write sys calls.

if we run the program at svr1, the time consumed is merely around 8min, but things get different if we run the same program at svr2, the time consumed is 4hours more.

it supprised me a lot, the point is svr2 must use nfs to input/output data, I am wondering if the nfs is the bottleneck that makes so, and how to solve that.

code segment is as follows:

static CSDB_bool fillset(int fd, char* buf, MYSQL_RES* res, int num_chno, int trlen)
{
	//
	
	
	char* pos		= NULL; //	
	off_t bottom 	= 0; //0
	int size		= num_chno; //
	off_t offset 	= 512;
	size_t bytesize = size * trlen;	//()
	size_t sys_page_size = sysconf(_SC_PAGESIZE); //
	int comp		= offset % sys_page_size;	//
	offset 			= offset - comp;	//
	
	//
	//mmap
	char* map = (char*)mmap(NULL, bytesize, PROT_READ, MAP_PRIVATE, fd, offset);
	if(map == MAP_FAILED)
	{//
		perror("mmap()");
		return FALSE;
	}
	//int i = 0;
	//for(i = 0; i < num_chno; ++i)
	MYSQL_ROW row;
	int chno = 0;	
	while((row = mysql_fetch_row(res)))
	{		
		chno = atoi(row[0]);
		if(chno > bottom && chno < bottom + size - 1)
		{		
			pos = map + (chno - bottom) * trlen;
			if(offset == 0)
			{
				memcpy(buf, pos + 512, trlen);	
			}
			else
			{
				memcpy(buf, pos + comp, trlen);	
			}		
					
		}
		else
		{
			offset 	= 512 + (size_t)chno * trlen;
			comp	= offset % sys_page_size;	//
			offset 	= offset - comp;	//
			//munmap(map, bytesize);
			//map = (char*)mmap(NULL, bytesize, PROT_READ, MAP_PRIVATE, fd, offset);
			map = (char*)mmap(map, bytesize, PROT_READ, MAP_PRIVATE | MAP_FIXED, fd, offset);
			if(map == MAP_FAILED)
			{
				return FALSE;
			}
			bottom = chno;			
			pos = map;
			if(offset == 0)
			{
				memcpy(buf, pos + 512, trlen);	
			}
			else
			{
				memcpy(buf, pos + comp, trlen);	
			}	
			
		}
		buf = buf + trlen;			
	}
	//
	munmap(map, bytesize);
	return TRUE;
}

any one got clue please give me some suggestion
thanks a lot in advances

ben

jim_mcnamara · March 31, 2009, 10:27am

Consider tuning NFS - you did not mention the OS, but here is a link, you can use google for other links for NFS tuning.

Plus your code is hard to read -- please use [ code ] [ /code ] tags (I added spaces so you could see the the tags themselves).

jim_mcnamara · March 31, 2009, 10:30am

One note: why are you calling mmap() INSIDE a loop? Usually one call suffices.

blf0 · April 1, 2009, 10:29pm

Thanks so much

The OS is rhel4, so the link you offered just hits it.

The files we deal with are often too large to map it into address space at one time, so I just map a part of file at a time, and check if the data I need is in already mapped, if so, memcpy it, or I should remap it.

have I made myself clear, sorry about my poor english:), do you have any better solution of this kind of problem?

jim_mcnamara · April 2, 2009, 10:39am

Try NFS tuning first. You may also want to increase virtual memory on the client.
Also, if you can, get the sysadmin to turn off atime updates on your NFS mounted filesystem.

Corona688 · April 2, 2009, 6:58pm

Why memcpy an mmap-ed block anywhere? This has little advantage over just read()-ing it, because memcpy forces it to page in the entire section anyway. Ideally, whatever function needs this data could use the mmap-ed blocks directly, which would have the advantage of paging in only the data it actually uses.

blf0 · April 2, 2009, 8:50pm

thanks Jim, I already learnt the tuning skill, and I will give it a try once I get the chance.

and also thank Corona, yes there're many potential issues in design time, but now there seems to be no way to change that, at least this version.