Minutes of the MB-NG operational meeting Date: 12 September 2003 Time: 10:30 am Location:UCL Pearson lab / D12 Tel: 020 7679 7029 Video-Conf details: Gatekeeper: 193.60.253.29 Port_no: 1719 Conference no: *301234 Present: F. Saka, N. Pezzi, J. Orellana, M. Rio, P. Clarke, A Di Donato. Via vid-conf: S. Dallison, G. Fairey, R. Hughes-Jones, R. Tasker Applications ============ Reality Grid ------------ 11/04/03 Reality Grid: Arrange to meet with Reality Grid people (Stephen Pickles and John Brooke cc Peter Coveney) before collaboration meeting about running Reality Grid on the MB-NG network. 09/05/03 The Reality Grid meeting will be held on 12/05/03 at 11am. 23/05/03 Next meeting with Reality Grid on 28 May 2003. 13/06/03 A schedule and set of milestones are being drawn up. 27/06/03 A document has been produced detailing the initial Reality Grid experiment we will do through the MB-NG network. We need a fourth PC in Manchester HEP for the experiment. S. Dallison has offered one of his PCs for this. The physical aspect of this experiment are being put into place. A date has been set for a meeting on the 8th July in Manchester to do the experiment. 09/07/03 Ongoing. Robbin Pennings from MCC will help with the setup at Manchester 25/07/03 Robbin Pennings from MCC will help with the setup of the Vizserver client at Manchester. Preliminary setup is ongoing. Acterna have been contacted about using the DA3600 with packet slicing, but so far no reply. 08/08/03 Ongoing. Iperf tests show that the max output rate of Dirac is around 300 Mbit/s and the max input rate is around 170 Mbit/s. Tests are being performed with the Vizserver demo applications. The fibre splitters for the Acterna DA3600 traffic analyser have arrived and will be tested today. 12/09/03 A paper on the tests with the vizserver demo has been written. Request for the reality-grid application has been made, but so far no responses have been received. ACTION: R. Hughes-Jones will prompt reality-grid people. May test performance of Jumbo frames to see if the performance agrees with http://www.vets.ucar.edu/Reports/NetworkBenchmarks/ gige_9k_mtu.html which reports 700 Mbit/s for 9000 bytes frames TCP over QoS ------------ 12/09/03 Tests have also been going on the with TCP over QoS and with self similar background. Current results suggests that AF classes need Scalable TCP whereas flows in the BE class can survive with vanilla. BABAR -------- 23/05/03 R. Tasker has an outline plan on getting BABAR data from RAL to Manchester through the MB-NG network. R. Tasker will arrange a meeting with Roger Barlow and R. Hughes-Jones. The aim is to get to start sending data by September 2003. 13/06/03 Some lab tests have already started. For the schedule document, the BABAR people must be consulted. The experiments are planned to start around October. 27/06/03 See high throughput programme. 08/08/03 Tests on the BABAR machines show that they cannot reach line rate. The network card (Intel Gigabit Ethernet) and the Raid card (3-ware) share the same 33MHz 32 bit PCI bus. There is no 2nd PCI bus. A solution could be to use the MB-NG machines as BABAR servers. However, it would require 0.5 to 1.3 terabytes of disk space. 12/09/03 BABAR memory to raid tests on the showed 600 Mbit/s. The results are on a web-page and on the paper presented at the all-hands meeting. ACTION: R. Hughes-Jones will make paper available for inclusion on the MB-NG web site. The next steps are: In October/November, run tests between BABAR machines across the MB-NG network. Demonstrate QoS working in various ways with the currently unused BABAR servers from RAL to MAN. Going through a firewall. Demonstrate to BABAR people the improved performance with high performance PCs. BABAR have expressed interest in purchasing the new 66MHz 3Ware cards. Task 12: High throughput programme ================================== 04/04/03 The GridFTP disk-to-disk results are 520 Mbit/s compared with memory-to-memory rate at 941 Mbit/s. Manchester are investigating to improve the disk-to-disk performance. 11/04/03 Ongoing. Read/write tests to disk show a rate of 800 Mbit/s. Investigation Web100 output shows the transfers with GridFTP achieves, but does not maintain line rate during tests. HTTP file transfer program achieves 500 Mbit/s and Apache web server achieves 700 Mbit/s (Disk-to-disk). 21/03/03 Intermittent burst of receive errors have been observed (using ifconfig) on the Manchester PCs interfaces. ACTION: S. Dallison will look at combinations of kernel, drivers and interrupt coalescence values to try and mitigate the problem. 04/04/03 Ongoing. Coalesce value of 64 gives better results. 11/04/03 Ongoing. The Intel Gigabit Ethernet card is going to be changed to see if it is the problem. 09/05/03 Now all PCs at Manchester are able to receive a maximum of 950 Mbit/s. The transmission rate of the Manchester PCs are slightly lower than the UCL PCs (940 cf 950 Mbit/s). 23/05/03 In the end-of-year report, with the correct settings, 800 Mbit/s disk-to-disk was achieved with the Apache web server. With the radio astronomy software, large packet loss was observed. The conclusion is that the way the application is written is critical to the performance. 13/06/03 A Schedule of experiments are being drawn up. 27/06/03 Tests on Disk-to-disk transfers are ongoing. For GridFTP tests, a new version of Globus has been installed, but disk-to-disk is still only 500 Mbit/s as with the previous version. The main aims here are to solve the disk-to-disk issues and to run the BABAR data through MB-NG. 09/07/03 Discussion between R. Hughes-Jones, R. Tasker and BABAR to arrange the BABAR data transfer experiment are ongoing. Disk-to-disk performances are ongoing. Performance tests show 400 Mbit/s write and 1200 Mbit/s read. Back-to-back results with BBCP shows 300 Mbit/s read/write (3ware Hardware Raid 0). This issue must be discussed in a dedicated brain-storm. We must achieve 1 Gbit/s by the end of the project unless we agree that it is impossible. 25/07/03 No progress. 08/08/03 Multi-stream TCP, with packet drops and background traffic and disk-to-disk tests are ongoing. 12/09/03 Issues on losses on GE-WAN cards have been tested by S. Dallison. He has reported the problem to Nick Carter. There is some confusion about raising a TAC case which has to be done through Nick Carter. Another issue being investigated is that when running a single stream TCP, up to 3 duplicate acks are observed periodically. Work has started on 10 Gbit/s links. M. Rio and Yee-Ting Li are working on a paper detailing recipes for obtaining high throughput with standard TCP and modified TCP (High throughput cookbook. see papers section). The aim is to have it finished for comments in one or two months and publishable by Christmas. Tests at Bedfont lakes ====================== Booking Cisco's Bendfont lakes for standalone tests. This will be interesting after the completion of the project report. We need to discuss: - Equipment we require and confirmation of availability - Plan of test we need to carry out (We will only have two days). - Who will be going. 07/03/03 We have proposed 1st and 2nd May 2003. We are waiting for confirmation from Cisco. 21/03/03 This date has been confirmed. We need to write the test plan. 11/04/03 The test plan will be reviewed today. 09/05/03 The Bedfont Lakes tests have been moved. New dates to be confirmed. The test plan is currently being reviewed with Nick Carter. 13/06/03 The tests will take place on Tuesday the 17th June 2003. 27/06/03 The tests have been performed. We have a better idea of how the GSR works, but we must scrutinise the results to make sure and possibly suggest more tests to confirm our understanding. 09/07/03 Results will be discussed with Nick Carter on 18/07/03. We must test the GSR's OC-48 Engine 3 "tofab" queueing. 25/07/03 Features of the GSR mean that we must restrict the offered load per class and setup WRED. This is the essence of the next set of tests we will perform at Bedfont lakes. 08/08/03 The test plan for testing the WRED setting and the rates of the different classes with a policy has been written. We may require a third visit to Bedfont Lakes to test the GSR's Engine 3 line card output policy when the IOS 12.0(26)S becomes available (due this month). 12/09/03 Tests will be done on the 16th September. 12.0(26)S has been available since the 02/09/03 MPLS ==== 09/05/03 MPLS: Dynamic may not be straight forward. Static may be feasible. Rina Samani is working on the rules for accessing the GSRs. She is currently on holiday. ACTION: Mike Allenby will talk Jeremy Sharp on this issue. ACTION: Ask Jonathan Couzens if he has example MPLS configurations. 13/06/03 On MPLS, there are problems classifying which traffic flow should go in the tunnel and which one go through the normal IP path. We have an example MPLS configuration from N. Carter which we are trying out. 27/06/03 MPLS is now working on the 7200 after an IOS upgrade from 12.2(12.9)T to 12.2(15)T2. The next step is to try it on the MB-NG network. 09/07/03 Ongoing. 25/07/03 There are problems getting MPLS to work across more than 2 GSRs. It may be a problem with RSVP. Nick Carter Carter has been informed. ACTION: He will ask someone at Cisco Also, it looks like the 7600 cannot put traffic from other devices on an MPLS tunnel. 08/08/03 The issue with the GSR has been solved by upgrading the ISO. The problem with the 7600 persists. 12/09/03 MPLS does not work with the 7600 even though the commands are present. On the GSR, we are able to create MPLS tunnels but; a) There is no performance improvement from IP. b) No bandwidth reservation to provide a leased-line like service. c) Multiple loops cannot be created to provide longer RTTs. We will be talking to Nick Carter at Bedfont lakes on the 16th September about the possibility of using MPLS-VPN. Experience of MPLS on Cisco equipment should be written up. Task 11: The deployment and integration of the Middleware and APIs (GARA) -------- 04/04/03 Valentina Capaccio has setup the mailing list for GARA. http://server11.infn.it/archive-gara/ 09/05/03 Now that Valentina has left, we must find out who will maintain the GARA mailing list. 23/05/03 INFN are looking to employ someone to fill Valentina's role. 21/03/03 UCL is working with INFN to setup and debugging the latest version of GARA. Setup of the mailing list is ongoing. 11/04/03 UCL is working with INFN to try to get GARA stable in the Globus 2.2 environment. There is no plan to move to Globus 3.0 09/05/03 No progress. Leon Goomans' group (University of Amsterdam) are looking at integrating GARA with AAA. 23/05/03 Timescale for deployment of Globus 2 and GARA is given as one month. 13/06/03 No progress. Date for deployment is end of July (taking holidays into account). An outstanding question is how will FTP use GARA? 27/06/03 No progress. 09/07/03 No progress. We must discuss what effort is excluded in order to get moving on this. 25/07/03 J. Orellana is full time on this task. 08/08/03 Globus 2.2.4 has been deployed and GARA 1.2.3 is running. There is a current problem in the communication between GARA and Globus. The Globus configuration file is not understood by GARA. 12/09/03 Issues on getting GARA to run the simple test program still persists. There will be a phone conference on the 17th September with Amsterdam and Italy to discuss GARA. M. Rio will talk to Saleem Bhatti on GRS. David Walker's student from Cardiff should be invited to UCL to present their middleware solution. During the DataTag meeting on the 24/25 September, J. Orellana should talk to the Amsterdam group to find out what they are doing with GARA and about its integration with AAA. When the above has been done, we should have a day at UCL where all the Middleware options are explained. SLAs, Policies, classification and policing =========================================== 4.9 28/02/03 Policies, classification and policing: A draft report has been written by M. Rio. Efforts should be made to meet once a week with S. Bhatti until this task is complete. 07/03/03 No progress. Saleem was ill this week. 21/03/03 A draft has been sent to UKERNA (M. Allenby). ACTION: M. Allenby will look at it with C. Cooper next week. and also ask for his availability for a meeting with UCL people. 11/04/03 The proposal is to write two documents. One on general requirements and another on specific implementations and configuration. The current document follows the GEANT EF class definition. ACTION: M. Rio to send a mail to the people concerned to propose a face-to-face meeting to discuss these issues. 09/05/03 Setting up face-to-face meeting ongoing. It should include R. Tasker, M. Allenby, M. Rio and C. Cooper. C. Cooper has produced a draft document for UKERNA's internal QoS program. This will be made available when it is finished. 23/05/03 M. Allenby is still trying to get a date when both C. Cooper and R. Tasker are available. M. Allenby will also ask C. Cooper if it is OK to disseminate his draft policy document to MB-NG. 13/06/03 Action: M. Rio will send a mail to C. Cooper, R. Tasker, D. Rogerson to set up the meeting. 27/06/03 D. Rogerson will take the token for making this happen. 09/07/03 No progress. 25/07/03 No progress. 08/08/03 No progress. We must decide quickly on this and move on. 12/09/03 No progress. Task 9: Managed bandwidth service -------- 01/11/02 On hold until network is in place. 21/03/03 M. Allenby will talk to Chris Cooper on this. 11/04/03 Ongoing. M. Allenby will produce something will be produced by the end of May. 23/05/03 Ongoing. 13/06/03 Ongoing. D. Rogerson has taken over M. Allenby's role on MB-NG. M. Allenby will send his document to D. Rogerson to distribute. 27/06/03 The managed bandwidth document is currently with R. Samani. D. Rogerson will get hold of this document and distribute it. 09/07/03 Ongoing. 25/09/03 The document has been circulated. 08/08/03 No progress. We should look at DANTE's is doing, but ultimately, UKERNA should decide on this issue. 12/09/03 No progress. Status of testbeb ================= 09/05/03 At UCL, the new GE-WAN ports of the 7600 routers are not communicating. At Manchester they have managed to send traffic between a GE-WAN port and a GE Catalyst port. This may be due to the layer 2 auto-negotiation. N. Pezzi will check it this afternoon. 23/05/03 The IOS we are running is experimental, but required to make QoS work. After contacting Cisco, we have been advised to wait until the official release of the official IOS 13/05/03 Two GE-WAN ports on one of the Enhanced version 2 line cards at UCL are not working using the new production IOS. ACTION: All other sites should check to make sure there are no problems with the ports on their line cards. 27/06/03 Manchester id O.K. but at RAL, one of the line cards is displaying the same symptoms as the UCL line card. Two of the GE-WAN ports do not seem to work. Logical have been notified. We are expecting replacement cards. 09/03/03 No progress. 25/07/03 The details have been sent to Logical. They have the new cards so we are expecting them to notify us about installing them 08/08/03 The new cards have been installed both at RAL and at UCL. Initial tests show that the cards do not have the same fault as the previous ones. However Manchester report that at line rate, they see 280 pkts lost every second. When they send rate is reduced by around 10 Mbit/s, the problem disappears. Thus current high throughput tests are being performed without the GE-WAN cards. 12/09/03 Issues on GE-WAN losses, See the High Throughput section. 23/05/03 The RAL-Reading link is still in loop-back. A date is being arranged for a Logical engineer to go to RAL to harvest memory on old OC-48 long-reach and transplant it to the new OC-48 long-reach. The attenuators are in place. The 7600s will be configured next week. 13/06/03 C. Seelig is working on configuring the RAL equipment. R. Tasker will contact C. Seelig to find out the progress. 27/06/03 As mentioned above two of the ports on the GE-WAN line card are not working. Currently the OSRs and firewall have been connected to the MB-NG networks and are accessible remotely. Two of the three PCs have been installed, but are not yet accessible remotely (requires NIC drivers). The last PC has a fault. Possibly with the power supply. 09/07/03 Boston have been notified about the faulty PC. ACTION: N. Pezzi to send the RAL IP addresses to Manchester. 25/07/03 We expect the replacement PC to be delivered next week. RAL IP addresses will be sent to Manchester once the whole of RAL comes up. 08/08/03 The third PC at RAL has been replaced. We are waiting for it to be networked. Chris Seelig is on Holiday, so Nick More at RAL will try to get it networked today. 12/09/03 We are still waiting for the third PC to be networked. Chris Seelig should get it up today. Task 7: e2e Network equipment configuration -------- 11/04/03 Returning Other loan equipment: Manchester can return their loan 7200. At UCL, we would like to keep the 7200 for work on the middleware and MPLS through the production network. 13/06/03 Manchester will send their loan equipment off today. 09/05/03 Harvey Lang of Cisco has forwarded this request to Mike Mckeown. 23/05/03 We will push Cisco on this issue. Manchester are retuning their loan 7200. 27/06/03 Manchester have returned their GSR. M. Mckeown has been contacted regarding keeping the 7200. We are awaiting his response. 09/07/03 Ongoing. 25/07/09 M. Mckeown has been banned from loaning any other equipment until the 7200s have been returned. He is chasing Jane Butler to get the loan transferred to her. 08/08/03 No progress. 12/09/03 No news. Task 3: Traffic generation and measurement (equipment provision) -------- 08/11/02 ACTION: R. Hughes-Jones to arrange meetings to discuss options and what we loose if we do not have GPS. This is a low priority. ACTION: UCL to investigate access to UCL based GPS time server (Speak to Lewis Grantham or Piers O'Hanlon). ACTION: S. Dallison to investigate the availability of a time server at Manchester. 31/01/03 Possibility of using NTS servers synchronised by the Rugby clock. The resolution must be checked. 28/02/03 Manchester have a GPS system which they are investigating (works with Windows but not yet with Linux). http://www.ripe.net/ttm/Misc/ 21/03/03 Manchester's GPS system is working under Linux. The resolution has not been tested yet. 09/05/03 Work is ongoing on at Manchester on how to calibrate and make an accurate measurement of the resolution of the RIPE GPS system. An antenna has been place on the roof of the Physics and the system is receiving a strong signal compared to a weak one when the antenna was indoors. 23/05/03 The Manchester GPS system was hacked. The software is being rebuilt. Before that it was running satisfactorily. 13/06/03 The GPS system has been rebuilt. The resolution has not been tested partly because we don't know how. 27/06/03 Ongoing. The system is installed and working. 09/07/03 Ongoing. Manchester have ordered another system for ATLAS costing £1240+VAT. 08/08/03 A new GPS antenna has arrived in Manchester and will be tested. 11/04/03 I. Bridge suggested we look at the Garmin hand-held GPS Garmin: GPS 16 HVS (Part No. 010-00258-03) £134.98 incl. VAT http://www.garmin.com/manuals/gps16qsg.pdf http://www.garmin.com/manuals/66.pdf ACTION: We should buy a pair and test them. 09/05/03 One has been ordered to test at UCL. 23/05/03 Ongoing. The UCL's Garmin GPS device needs a power supply. 13/06/03 Ongoing. The power supply has been made thanks to Mathew Warren at UCL. We are now looking at where to locate it. 27/06/03 Ongoing. 09/07/03 Ongoing. 25/07/03 A position has been found on the 4th floor of UCL Physics and Astronomy department. Tests will be carried out to see if it is adequate. 08/08/03 In its current position, only 2 satellites are visible. 3 are needed for the 1PPS signal. Task 10: Plan extension of demonstrations to sites in Europe and US -------- 01/11/02 On hold until network is in place. 31/01/03 A talk on MB-NG and our use of the Spirent equipment should be given at the Internet 2 members meeting. 09/05/03 We should start planning now on extensions to Europe and US. 23/05/05 We will wait until the network is fully in place. 23/05/03 UKERNA should write value added to them in the year 1 report. Papers ====== 12/09/03 We should attempt to write up all things worthy of a paper which have come out from the project. a) High throughput cookbook. b) Experiences of MPLS on Cisco platform. c) Self Similar traffic Generation and analysis. AOB ===== 08/08/03 Supercomputing 2003 (Mid November): BT want to be involved and not only on connection loan terms. They may even by their own SGI Vizserver machine. Tim McFadden of BT will contact F. Saka of MB-NG. 12/09/03 No news. 12/09/03 Fall-back plan for the next meeting if the Pearson studio has problems with the video conference. We should test the connection with the Polycom equipment in P. Clarke's room at 10:00 am. Next meeting: 26 September 2003 (the 2nd and 4th Fridays of the month).