Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
TIdIMAP4::RetrieveHeader speed
#1
I am tasked with downloading all message headers from many mail boxes, so this is my 1st try with VCL:

Code:
//TId components managed by the IDE
TIdIMAP4 *IdIMAP4;
Code:
TIdMailBox *IdMlBx;
Code:
TIdMessage *IdMsg;
Code:
//....IdIMAP4 is connected and authenticated
int nMsgs;
UnicodeString s = "-1";
DWORD t;
  try{
    if( !IdIMAP4->ExamineMailBox(asName,IdMlBx) )
      throw Exception( "!ExamineMailBox" );
    nMsgs = IdMlBx->TotalMsgs;
    t = ::GetTickCount();
    for( int iMsg=1; iMsg<=nMsgs; iMsg++ ){
      if( !IdIMAP4->RetrieveHeader(iMsg,IdMsg) )
        throw Exception( "!RetrieveHeader " + IntToStr(iMsg) );
      //IdMsg->UID is empty (?)
      if( !IdIMAP4->GetUID(iMsg,s) )
        throw Exception( "!GetUID " + IntToStr(iMsg) );
    }//for
    t = ::GetTickCount()-t;
  }catch( EIdException &e ){
    return false;
  }//catch( EIdException &e )
  catch( Exception &e ){
    return false;
  }
double dt = (double)t/(double)nMsg;//252.53495873621098[ms]

My problem is the value of dt: it is greater than 250ms  Huh

Meaning that I can only get 4 headers per second - this is unacceptably slow.

Tested on different mail boxes of different accounts; PC is connected to inet with 200x20 plan.

Is there a way to call RetrieveHeader for more than just 1 message at a time?

Thanks in advance, and have a nice weekend. Boba TC.
Reply
#2
(06-24-2022, 09:10 PM)Boba TC Wrote: my problem is the value of dt: it is greater than 250ms  Huh

Well, for starters, you are only looking at an average time. Each message will have a different size of header data, and take a different amount of time to download. Maybe there are larger headers that are increasing the average.

Also, you are issuing multiple IMAP commands per loop iteration, which is skewing the average, too. One thing you could try is to alter the source code for TIdIMAP4.RetrieveHeader() (or, simply duplicate its logic in your own code) to include requesting the UID in its FETCH command so you don't have to retrieve the UID in a separate command. Oddly, RetrieveHeader() is already setup to parse a UID if one is present, it is simply not requesting a UID (there is a ticket already open for that in Indy's issue tracker: https://github.com/IndySockets/Indy/issues/114). I'll fix that in a future release.

In any case...

(06-24-2022, 09:10 PM)Boba TC Wrote: Meaning that I can only get 4 headers per second - this is unacceptably slow.

IMAP is a complex protocol, and TIdIMAP4 isn't the most efficient implementation, either in terms of command usage or in parsing. Also, there are things that the IMAP protocol supports which your task could benefit from (pipelining, bulk fetching, etc) but which TIdIMAP4 doesn't support at this time.

But, that being said, you are clocking the overall time it takes TIdIMAP4 to send the FETCH commands, the time to receive their replies, and the time to parse those replies. The question is, where is that time actually being spent the most?  How fast or slow does the server actually reply? How fast or slow does TIdIMAP4 take to parse? You really need to profile these operations separately to discover where the bottleneck really is.

(06-24-2022, 09:10 PM)Boba TC Wrote: Is there a way to call RetrieveHeader for more than just 1 message at a time?

Unfortunately no. It only supports 1 message at a time.

However, the underlying FETCH command does support requesting a sequence of message numbers at a time, and they will all be included in a single reply (this is actually a TODO item for updating the implementation of TIdIMAP4.RetrieveAllHeaders(), which currently calls RetrieveHeader() in a loop, similar to your example). So, you may have to resort to manually issuing your own FETCH command and parsing the reply.

Another thing to consider - do you really need to download ALL headers at a time? If you are connecting and reconnecting over time, you might consider using TIdIMAP4.SearchMailBox() to limit your download of headers to just new messages arrived since the previous download.

Reply
#3
Thanks, Gambit, for looking at this and for your valuable input.
I'm doing some reading on IMAP4 (mostly RFCs) and learning how to use TIdIMAP4::SendCmd.
Found one of your useful recommendations (RFC by Melnikov) on synchronizing email boxes with IMAP.
Will keep posted. Boba.
Reply
#4
I do not want to test my IMAP4 'skills' on real mail boxes; so created a test gmail one, enabled IMAP, but need more help: can not login to it.
TIdSSLIOHandlerSocketOpenSSL params:
.Destination="imap.gmail.com:993";
.Host="imap.gmail.com";
.Port=993;
.SSLOptions.Method=sslvSSLv23;
TIdIMAP4.UseTLS=utUseImplicitTLS;//[AUTHENTICATIONFAILED] Invalid credentials (Failure)
TIdIMAP4.UseTLS=utUseExplicitTLS;//Connection Closed Gracefully.
Does google allow connections at all? if so, what would be the correct params for TId components?
Thanks.
Reply
#5
I think google requires SASL now. The IDE Tool Palette does have 'Indy SASL' components but no mentioning of OAuth2. Am I doomed with RAD Studio 2009?
Reply
#6
from my example above

Code:
  try{
    if( IdIMAP4->ExamineMailBox("INBOX",IdMlBx) ){
      UnicodeString s1,s2;
      //? s2=IdIMAP4->SendCmd("B29","FETCH 10:20 (FLAGS)",s1);//does not compile
      s2=IdIMAP4->SendCmd("B29 FETCH 10:20 (FLAGS)",s1);//compiles, but does not return and no Exception raised, why?
    }
  }catch  ....
what is the correct C++ syntax of TIdIMAP4::SendCmd method? can't find TIdImap4.hpp file on my 20 year old XP devbox  Cry
Reply
#7
(06-25-2022, 10:58 PM)Boba TC Wrote: I do not want to test my IMAP4 'skills' on real mail boxes; so created a test gmail one, enabled IMAP, but need more help: can not login to it.

Per https://support.google.com/accounts/answer/6010255:

Quote:To help keep your account secure, from May 30, 2022, ​​Google no longer supports the use of third-party apps or devices which ask you to sign in to your Google Account using only your username and password.

So, you have three choices:

- go into your Gmail account and turn "less secure apps" back on. Then you can login with your Gmail username and password as before.

- go into your Gmail account and turn on 2FA and create an app-specific password. Then you can login with your Gmail username and app password.

- implement OAuth2 in your code. See https://developers.google.com/identity/protocols/oauth, and have a look at https://github.com/geoffsmith82/GmailAuthSMTP/, which should be adaptable for IMAP, not just SMTP. Also see this branch https://github.com/indysockets/indy/tree/sasl-oauth in Indy's repo.

(06-26-2022, 12:19 AM)Boba TC Wrote: I think google requires SASL now. The IDE Tool Palette does have 'Indy SASL' components but no mentioning of OAuth2. Am I doomed with RAD Studio 2009?

https://github.com/geoffsmith82/GmailAuthSMTP/

https://github.com/indysockets/indy/tree/sasl-oauth

(06-26-2022, 04:15 AM)Boba TC Wrote: from my example above
...
what is the correct C++ syntax of TIdIMAP4::SendCmd method?

Try this:

Code:
s2 = IdIMAP4->SendCmd(IdIMAP4->NewCmdCounter, "FETCH 10:20 (FLAGS)", OPENARRAY(String, ("FETCH")));

Or, you can omit the counter and TIdIMAP4 will generate one for you:

Code:
s2 = IdIMAP4->SendCmd("FETCH 10:20 (FLAGS)", OPENARRAY(String, ("FETCH")));

Reply
#8
Many thanks, Remy, for the explanation.
Now I'd like to know how to get values of those flags returned by the server; the following compiles and executes:
Code:
s2 = IdIMAP4->SendCmd(IdIMAP4->NewCmdCounter, "FETCH 10:20 (FLAGS)", OPENARRAY(String, ("FETCH")));
after it returns, s2=="OK"; what is the TIdIMAP4 method() for that? TIA.
Reply
#9
(06-29-2022, 02:36 AM)Boba TC Wrote: Now I'd like to know how to get values of those flags returned by the server; the following compiles and executes:
Code:
s2 = IdIMAP4->SendCmd(IdIMAP4->NewCmdCounter, "FETCH 10:20 (FLAGS)", OPENARRAY(String, ("FETCH")));
after it returns, s2=="OK"; what is the TIdIMAP4 method() for that?

The downloaded data is in the IdIMAP4->LastCmdResult->Text property.

Internally, TIdIMAP4 uses a ParseLastCmdResult() method to parse out various fields from FETCH responses into the FLineStruct member, including FLAGS.

ParseLastCmdResult() parses 1 reply item at a time, so for commands that download a range of replies (ie, (UID)RetrieveAllEnvelopes(), (UID)RetrieveMailBoxSize(), etc), TIdIMAP4 loops through the LastCmdResult->Text parsing each string individually.

The ParseLastCmdResult() and FLineStruct members are protected, so either use an accessor class to reach them, or just parse the LastCmdResult->Text yourself as needed.

Reply
#10
Great!! works well. Thank you very much, Remy!
Reply


Forum Jump:


Users browsing this thread: 1 Guest(s)