Category Archives: VSphere

What is RSC and why does it matter in Vmware

Got latency on your VM’s? It might be RSC

Let’s say we have two VM’s with the  following specs:

  • Windows 2012r2 (Latest patches)
  • VMxNet3
  • Same VLAN
  • Same version of ESXI 3029758
  • Both HW version 11
  • Both have the latest VMware tools for the ESXI Build 3029758
    Basic host diagram

Any new latency between these two machines wouldn’t be expected considering they are both within the same physical location and all within the same topology that has been in place for months almost untouched.

After Vmotioning both to the same host, the latency goes away. That would be expected considering VM’s will use the internal networking for VM to VM networking located on the same host. Bypassing the need to go out to the VDS or vSwitch As shown below.

Local host diagram

 

So let’s go back to the title of the post. What is RSC and why does it matter to VMware?


RSC ( Receive Segment Coalescing) is a technology used to help CPU utilization on a server. It achieves this task by taking the payload off of the CPU and giving it to the network adapter, in our case the VMxNet3 VMware adapter.  RSC Strips headers from the packet’s, combines those packets, then sends those packets to the right destination.  Without RSC the receiver would get 4-5 packets, but with RSC enabled the receiver only has to process the single packet sent with the 5 packets of information stuffed inside.

With VM Hardware version 11 there was a bug introduced that caused the ESXI to not keep the data properly is the PSH flag (PSH Flag explanation ) was not written in the first packet  but was written to the packets following. If you read the article below it gives an awesome example of why this is useful.

Imagine you are walking in a line of 5 friends, friend 1 doesn’t have a pass to get in the gate, but friend 2-5 does. Being a gentleman friend 1 lets friend 2-5 go thru while he buys his ticket. But then friend 2-5 are waiting for him while they are already in the park. pretty similar concept to the PSH flags’.

While packet 2-5 have the PSH flag that grants them permission to go to the application, ESXi has a hiccup while waiting for the PSH flag on packet 1 causing it to wait before the packet can be sent and the full information is received.   Here is the KB article that has the problem highlighted Vmware KB .

What is the fix?


To fix this problem the solution is pretty simple on the OS side you can disable RSC and stop the function, but doing this keep track of the memory use on that box and the box that received a bulk of those packets.

 

Running the command  Will give you the output in below:

netsh int tcp show global

netsh int tcp show global

Received Segment Coalescing State is what we are looking for. If that shows as enabled run the command below to disable it. you should receive a similar output below after running the

netsh int tcp set global rsc=disabled

netsh int tcp set global rsc=disabled

you should receive an output similar to the one above after running the   netsh int tcp show global again.

 

What is affected?


Currently, this affects people running ESXI 6.0 build  3568940 or below also running windows 2008 r2 and above. This problem can be solved 1 of two ways. By updating to ESXi 6.0 Update 2 Build 3620759 or above, or by running the above command on your machines affected by the problem.

 


 

KB Articles below for reference also for citing sources:

http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2129176

https://communities.vmware.com/thread/524842?start=0&tstart=0

https://technet.microsoft.com/en-us/library/hh997024(v=ws.11).aspx

A general system error occurred: Connection refused

capture4

Logging into my dev environment today I had an issue where I couldn’t start VM’s.  This was happening across clusters and different hosts.

To get the VM’s up you can log into the ESXi host using the Vsphere client.

The issue is the workflow service either hung up or stopped in the VCenter server. In order to fix the workflow service, you will need to check the status and take the proper measures. Take a look below.


  1. Check the status with the command below. you will get an output similar to below. INFO:root: is the line where the service

service-control –status vmware-vpx-workflow

capture1

2. Once the status is found use one of the two commands to either start or stop the service.

service-control –stop vmware-vpx-workflow

capture2

service-control –start vmwarevpx-workflow

capture3

3. Try to power on a VM now and see if the results worked!


There’s no current KB article from VMware on this.  But plenty of resources on the topic.


 

VMFS deprecated Bug ESXI 6.0 Update 1

So if you recently added a new datastore to you vcenter you might have gotten a nice little alarm on the ESXi hosts you added the datastore to, something like below.
Deprecated VMFS

This little gem above is a bug that is included in your ESXi version 6.0 update 1. While it’s a pain to fix it is pretty easy solution and doesn’t actually mean you have deprecated VMFS volumes on your host. But just to make sure check anyway and make sure this bug applies to you.

The solution is pretty simple by restarting two management agents on each affected ESXi the error will clear, of course until you add a new datastore. Then chances are the error will reappear. If you have a massive amount of ESXi hosts and the option is available to update to ESXi 6.0U2 then I personally would update instead of restarting these agents on all your hosts.


Step 1: Log into your ESXi host via SSH or DCUI

Step 2: Run the following command to restart the management agents on the box

services.sh restart

ssh

This command will reset the management agents on the ESXi host and clear the deprecated VMFS tag that has plagued your vcenter.


Kb article is below for more from VMware.

VMFS deprecated datastore 2109735

VPostGres database filling up VCSA 5.5 U2

pre cleanup vcenterRan into an interesting issue the other day my VPostgres database was at 100% capacity. Doing some quick investigation I realized that the postgres log was filled by a warning that would display multiple times a second causing a slow leak in available storage.

Warning causing database pileup

This would happen hundreds of times an hour and slowly kill the remaining storage on the VPostgres. Here is how i fixed it below.


1.Log into the VCSA and find the error above. You can find this error in the following directory in the VCSA if you are using SSH.

#~ cd /storage/db/vpostgres/pg_log

This will contain a lot of the files with the naming convention “Postgresql-year-date


2. Stopping the services on the VCSA and database is the next step before continuing on. If the Postgres is filled to capacity there is no need to stop the postgres services. These commands should be ran to prevent any problems in the shutdown process and will help to make sure other problems don’t arise from services not started properly.

Run the command: service vmware-vpxd stop

stopvcenter

Then run the command : Service vmware-vpostgres stop

Postgres stop

In the above example the services didn’t stop properly because my lab Vcenter was already filled to capacity and the services was already stopped.


3. While you figured out that the issue is this warning that keeps popping up the next step is to go into the postgres configuration file and change the logging levels. I used WINSCP to go into the VCSA to find the configuration file but you can use whatever utility you would like.

But make sure before you change this configuration file you make a backup on the VCSA to rollback if needed. You can also make a snapshot to make sure all changes can be rolled back (Only if this Vcenter is managed by another Vcenter, Obviously you cannot take a snapshot of the vcenter that wont turn on)

Use the command below in your SSH session to make a copy of the configuration file we will use in case of rollback.

cp /storage/db/vpostgres/postgresql.conf /storage/db/vpostgres/postgresql.conf.orig

 

Logging into Winscp to find the configuration file is easy, follow the same path as above and open it up in your favorite text editor.

There is a line of code in the configuration file on-line 312: #log_min_messages = warning

What we want to do is change this line to read like: log_min_messages = error

See below for an example. Ignore the | before the command.

Config file

Save the file and we can begin to start the services on the VCSA.


4. Now lets remove the old files from the database. This can be done from WinSCP of it can be done from the SSH console to the VCSA I can show you the site of both. I personally would keep the last 2-3 months worth of logs for when there was any valuable information in there.

Run the command: RM /storage/db/vpostgres/pg_log/postgres-DateYouWanttodelete

remove old postgres

A much easier way in my opinion is doing this through WinSCP go to the same file place above Ctrl-Select your choices you want to deleted then right-click and select delete.


5. Starting the services is the best part IMHO it means the problem is solved and the solution we just worked out hasn’t failed us.  Run the two commands to start the VCSA services and we should be on our way.

Start the Vpostgres service : service vmware-vpostgres start

Start the VPXD service on the VCSA: service vmware-vpxd start

You should receive a message like the one below.

Start services


6. Last but not least it is time to see the finished result. though SSH run the following command: df -h

This command should spit out the list of your drives and capacity of those drives.

Or if you want a prettier look log into the vcsa appliance webpage and check out the dashboard

Https://VCSA:5480after database cleanup

Vmware KB article 2092127

 

Hope you enjoyed. Comment below for any critiques.

Search function not working!

emptySEARCH

 

 

 

 

So ran into an issue after an upgrade from 5.5 to 6.0 U1 EP3. Search function in VCenter would not work properly and would show as an empty inventory in Web client when logged in under a domain authenticated account.

There can be two solutions to this issue.

One solution is to restart the inventory services on your VCenter server or appliance. You can find these solutions in the KB articles below

Restarting VCenter services for windows

Restarting VCenter services for VCSA

But for my case this didn’t work. I even rebooted the entire VCenter to try to get a fresh start. For me the problem was a little bit different it seems.

The issue here seems to be the Identity resource on SSO got out of sync with the domain.While administrator@vsphere.local had the search function working and the web services functioning completely, I was a puzzled. After a quick 10 minute call to VMware the gentleman on the other side recommended we recreate the SSO identity source on the VCenter.

I took a screen shot of my SSO identity settings and took a deep breath while VCenter did its thing. Surprisingly, This seemed to fix the issue. Once the new identity source was put into place the search function returned and I could get back to business! From what i was told BY the tech at VMware this seems to be an issue they noticed often with upgrades. not really a bug of sorts but a hiccup in the upgrade process.

Documentation from VMware below.

VMware 6.0 documentation on SSO identity source